Gradient Dissent: Conversations on AI16 Syys

The Startup Powering The Data Behind AGI

In this episode of Gradient Dissent, Lukas Biewald talks with the CEO & founder of Surge AI, the billion-dollar company quietly powering the next generation of frontier LLMs. They discuss Surge's origin story, why traditional data labeling is broken, and how their research-focused approach is reshaping how models are trained.

You’ll hear why inter-annotator agreement fails in high-complexity tasks like poetry and math, why synthetic data is often overrated, and how Surge builds rich RL environments to stress-test agentic reasoning. They also go deep on what kinds of data will be critical to future progress in AI—from scientific discovery to multimodal reasoning and personalized alignment.

It’s a rare, behind-the-scenes look into the world of high-quality data generation at scale—straight from the team most frontier labs trust to get it right.

Timestamps:

00:00 – Intro: Who is Edwin Chen?

03:40 – The problem with early data labeling systems

06:20 – Search ranking, clickbait, and product principles

10:05 – Why Surge focused on high-skill, high-quality labeling

13:50 – From Craigslist workers to a billion-dollar business

16:40 – Scaling without funding and avoiding Silicon Valley status games

21:15 – Why most human data platforms lack real tech

25:05 – Detecting cheaters, liars, and low-quality labelers

28:30 – Why inter-annotator agreement is a flawed metric

32:15 – What makes a great poem? Not checkboxes

36:40 – Measuring subjective quality rigorously

40:00 – What types of data are becoming more important

44:15 – Scientific collaboration and frontier research data

47:00 – Multimodal data, Argentinian coding, and hyper-specificity

50:10 – What's wrong with LMSYS and benchmark hacking

53:20 – Personalization and taste in model behavior

56:00 – Synthetic data vs. high-quality human data

Follow Weights & Biases:

https://twitter.com/weights_biases

https://www.linkedin.com/company/wandb

Kokeile Premiumia

Nauti 14 päivää ilmaiseksi

Tilaa Premium

Jaksot(128)

Jordan Fisher — Skipping the Line with Autonomous Checkout

Jordan Fisher is the CEO and co-founder of Standard AI, an autonomous checkout company that’s pushing the boundaries of computer vision.In this episode, Jordan discusses “the Wild West” of the MLOps stack and tells Lukas why Rust beats Python. He also explains why AutoML shouldn't be overlooked and uses a bag of chips to help explain the Manifold Hypothesis.Show notes (transcript and links): http://wandb.me/gd-jordan-fisher---⏳ Timestamps: 00:00 Intro00:40 The origins of Standard AI08:30 Getting Standard into stores18:00 Supervised learning, the advent of synthetic data, and the manifold hypothesis24:23 What's important in a MLOps stack27:32 The merits of AutoML30:00 Deep learning frameworks33:02 Python versus Rust39:32 Raw camera data versus video42:47 The future of autonomous checkout48:02 Sharing the StandardSim data set52:30 Picking the right tools54:30 Overcoming dynamic data set challenges57:35 Outro---Connect with Jordan and Standard AI📍 Jordan on LinkedIn: https://www.linkedin.com/in/jordan-fisher-81145025/📍 Standard AI on Twitter: https://twitter.com/StandardAi📍 Careers at Standard AI: https://careers.standard.ai/---💬 Host: Lukas Biewald📹 Producers: Riley Fields, Cayla Sharp, Angelica Pan, Lavanya Shukla---Subscribe and listen to our podcast today!👉 Apple Podcasts: http://wandb.me/apple-podcasts👉 Google Podcasts: http://wandb.me/google-podcasts👉 Spotify: http://wandb.me/spotify

4 Elo 202257min

Drago Anguelov — Robustness, Safety, and Scalability at Waymo

Drago Anguelov is a Distinguished Scientist and Head of Research at Waymo, an autonomous driving technology company and subsidiary of Alphabet Inc.We begin by discussing Drago's work on the original Inception architecture, winner of the 2014 ImageNet challenge and introduction of the inception module. Then, we explore milestones and current trends in autonomous driving, from Waymo's release of the Open Dataset to the trade-offs between modular and end-to-end systems.Drago also shares his thoughts on finding rare examples, and the challenges of creating scalable and robust systems.Show notes (transcript and links): http://wandb.me/gd-drago-anguelov---⏳ Timestamps: 0:00 Intro0:45 The story behind the Inception architecture13:51 Trends and milestones in autonomous vehicles23:52 The challenges of scalability and simulation30:19 Why LiDar and mapping are useful35:31 Waymo Via and autonomous trucking37:31 Robustness and unsupervised domain adaptation40:44 Why Waymo released the Waymo Open Dataset49:02 The domain gap between simulation and the real world56:40 Finding rare examples1:04:34 The challenges of production requirements1:08:36 Outro---Connect with Drago & Waymo📍 Drago on LinkedIn: https://www.linkedin.com/in/dragomiranguelov/📍 Waymo on Twitter: https://twitter.com/waymo/📍 Careers at Waymo: https://waymo.com/careers/---Links:📍 Inception v1: https://arxiv.org/abs/1409.4842📍 "SPG: Unsupervised Domain Adaptation for 3D Object Detection via Semantic Point Generation", Qiangeng Xu et al. (2021), https://arxiv.org/abs/2108.06709📍 "GradTail: Learning Long-Tailed Data Using Gradient-based Sample Weighting", Zhao Chen et al. (2022), https://arxiv.org/abs/2201.05938---💬 Host: Lukas Biewald📹 Producers: Cayla Sharp, Angelica Pan, Lavanya Shukla---Subscribe and listen to our podcast today!👉 Apple Podcasts: http://wandb.me/apple-podcasts👉 Google Podcasts: http://wandb.me/google-podcasts👉 Spotify: http://wandb.me/spotify

14 Heinä 20221h 9min

James Cham — Investing in the Intersection of Business and Technology

James Cham is a co-founder and partner at Bloomberg Beta, an early-stage venture firm that invests in machine learning and the future of work, the intersection between business and technology.James explains how his approach to investing in AI has developed over the last decade, which signals of success he looks for in the ever-adapting world of venture startups (tip: look for the "gradient of admiration"), and why it's so important to demystify ML for executives and decision-makers.Lukas and James also discuss how new technologies create new business models, and what the ethical considerations of a world where machine learning is accepted to be possibly fallible would be like.Show notes (transcript and links): http://wandb.me/gd-james-cham---⏳ Timestamps: 0:00 Intro0:46 How investment in AI has changed and developed7:08 Creating the first MI landscape infographics10:30 The impact of ML on organizations and management17:40 Demystifying ML for executives21:40 Why signals of successful startups change over time27:07 ML and the emergence of new business models37:58 New technology vs new consumer goods39:50 What James considers when investing44:19 Ethical considerations of accepting that ML models are fallible50:30 Reflecting on past investment decisions52:56 Thoughts on consciousness and Theseus' paradox59:08 Why it's important to increase general ML literacy1:03:09 Outro1:03:30 Bonus: How James' faith informs his thoughts on ML---Connect with James:📍 Twitter: https://twitter.com/jamescham📍 Bloomberg Beta: https://github.com/Bloomberg-Beta/Manual---Links:📍 "Street-Level Algorithms: A Theory at the Gaps Between Policy and Decisions" by Ali Alkhatib and Michael Bernstein (2019): https://doi.org/10.1145/3290605.3300760---💬 Host: Lukas Biewald📹 Producers: Cayla Sharp, Angelica Pan, Lavanya Shukla---Subscribe and listen to our podcast today!👉 Apple Podcasts: http://wandb.me/apple-podcasts👉 Google Podcasts: http://wandb.me/google-podcasts👉 Spotify: http://wandb.me/spotify

7 Heinä 20221h 6min

Boris Dayma — The Story Behind DALL·E mini, the Viral Phenomenon

Check out this report by Boris about DALL-E mini:https://wandb.ai/dalle-mini/dalle-mini/reports/DALL-E-mini-Generate-images-from-any-text-prompt--VmlldzoyMDE4NDAyhttps://wandb.ai/_scott/wandb_example/reports/Collaboration-in-ML-made-easy-with-W-B-Teams--VmlldzoxMjcwMDU5https://twitter.com/weirddalleConnect with Boris:📍 Twitter: https://twitter.com/borisdayma---💬 Host: Lukas Biewald📹 Producers: Cayla Sharp, Angelica Pan, Sanyam Bhutani, Lavanya Shukla---Subscribe and listen to our podcast today!👉 Apple Podcasts: http://wandb.me/apple-podcasts👉 Google Podcasts: http://wandb.me/google-podcasts👉 Spotify: http://wandb.me/spotify

17 Kesä 202235min

Tristan Handy — The Work Behind the Data Work

Tristan Handy is CEO and founder of dbt Labs. dbt (data build tool) simplifies the data transformation workflow and helps organizations make better decisions.Lukas and Tristan dive into the history of the modern data stack and the subsequent challenges that dbt was created to address; communities of identity and product-led growth; and thoughts on why SQL has survived and thrived for so long. Tristan also shares his hopes for the future of BI tools and the data stack.Show notes (transcript and links): http://wandb.me/gd-tristan-handy---⏳ Timestamps: 0:00 Intro0:40 How dbt makes data transformation easier4:52 dbt and avoiding bad data habits14:23 Agreeing on organizational ground truths19:04 Staying current while running a company22:15 The origin story of dbt26:08 Why dbt is conceptually simple but hard to execute 34:47 The dbt community and the bottom-up mindset41:50 The future of data and operations47:41 dbt and machine learning49:17 Why SQL is so ubiquitous55:20 Bridging the gap between the ML and data worlds1:00:22 Outro---Connect with Tristan:📍 Twitter: https://twitter.com/jthandy📍 The Analytics Engineering Roundup: https://roundup.getdbt.com/---💬 Host: Lukas Biewald📹 Producers: Cayla Sharp, Angelica Pan, Sanyam Bhutani, Lavanya Shukla---Subscribe and listen to our podcast today!👉 Apple Podcasts: http://wandb.me/apple-podcasts👉 Google Podcasts: http://wandb.me/google-podcasts👉 Spotify: http://wandb.me/spotify

9 Kesä 20221h

Johannes Otterbach — Unlocking ML for Traditional Companies

Johannes Otterbach is VP of Machine Learning Research at Merantix Momentum, an ML consulting studio that helps their clients build AI solutions.Johannes and Lukas talk about Johannes' background in physics and applications of ML to quantum computing, why Merantix is investing in creating a cloud-agnostic tech stack, and the unique challenges of developing and deploying models for different customers. They also discuss some of Johannes' articles on the impact of NLP models and the future of AI regulations.Show notes (transcript and links): http://wandb.me/gd-johannes-otterbach---⏳ Timestamps: 0:00 Intro1:04 Quantum computing and ML applications9:21 Merantix, Ventures, and ML consulting19:09 Building a cloud-agnostic tech stack24:40 The open source tooling ecosystem 30:28 Handing off models to customers31:42 The impact of NLP models on the real world35:40 Thoughts on AI and regulation40:10 Statistical physics and optimization problems42:50 The challenges of getting high-quality data44:30 Outro---Connect with Johannes:📍 LinkedIn: https://twitter.com/jsotterbach📍 Personal website: http://jotterbach.github.io/📍 Careers at Merantix Momentum: https://merantix-momentum.com/about#jobs---💬 Host: Lukas Biewald📹 Producers: Cayla Sharp, Angelica Pan, Sanyam Bhutani, Lavanya Shukla---Subscribe and listen to our podcast today!👉 Apple Podcasts: http://wandb.me/apple-podcasts👉 Google Podcasts: http://wandb.me/google-podcasts👉 Spotify: http://wandb.me/spotify

12 Touko 202244min

Mircea Neagovici — Robotic Process Automation (RPA) and ML

Mircea Neagovici is VP, AI and Research at UiPath, where his team works on task mining and other ways of combining robotic process automation (RPA) with machine learning for their B2B products.Mircea and Lukas talk about the challenges of allowing customers to fine-tune their models, the trade-offs between traditional ML and more complex deep learning models, and how Mircea transitioned from a more traditional software engineering role to running a machine learning organization.Show notes (transcript and links): http://wandb.me/gd-mircea-neagovici---⏳ Timestamps: 0:00 Intro1:05 Robotic Process Automation (RPA)4:20 RPA and machine learning at UiPath8:20 Fine-tuning & PyTorch vs TensorFlow14:50 Monitoring models in production16:33 Task mining22:37 Trade-offs in ML models29:45 Transitioning from software engineering to ML34:02 ML teams vs engineering teams40:41 Spending more time on data43:55 The organizational machinery behind ML models45:57 Outro---Connect with Mircea:📍 LinkedIn: https://www.linkedin.com/in/mirceaneagovici/📍 Careers at UiPath: https://www.uipath.com/company/careers---💬 Host: Lukas Biewald📹 Producers: Cayla Sharp, Angelica Pan, Sanyam Bhutani, Lavanya Shukla

21 Huhti 202246min

Jensen Huang — NVIDIA’s CEO on the Next Generation of AI and MLOps

Jensen Huang is founder and CEO of NVIDIA, whose GPUs sit at the heart of the majority of machine learning models today.Jensen shares the story behind NVIDIA's expansion from gaming to deep learning acceleration, leadership lessons that he's learned over the last few decades, and why we need a virtual world that obeys the laws of physics (aka the Omniverse) in order to take AI to the next era. Jensen and Lukas also talk about the singularity, the slow-but-steady approach to building a new market, and the importance of MLOps.The complete show notes (transcript and links) can be found here: http://wandb.me/gd-jensen-huang---⏳ Timestamps:0:00 Intro0:50 Why NVIDIA moved into the deep learning space7:33 Balancing the compute needs of different audiences10:40 Quantum computing, Huang's Law, and the singularity15:53 Democratizing scientific computing20:59 How Jensen stays current with technology trends25:10 The global chip shortage27:00 Leadership lessons that Jensen has learned32:32 Keeping a steady vision for NVIDIA35:48 Omniverse and the next era of AI42:00 ML topics that Jensen's excited about45:05 Why MLOps is vital48:38 Outro---Subscribe and listen to our podcast today!👉 Apple Podcasts: http://wandb.me/apple-podcasts👉 Google Podcasts: http://wandb.me/google-podcasts👉 Spotify: http://wandb.me/spotify

3 Maalis 202248min

Premium

9,99 €/kk

Kaikki premium-podcastit
Ei mainoksia
Ei sitoutumista, peruuta koska tahansa

Aloita 14 päivän kokeilu

Premium

13,99 €/kk

Kaikki premium-podcastit
Ei mainoksia
Ei sitoutumista, peruuta koska tahansa
Yksi lisäkäyttäjä

Kokeile 14 päivää maksutta

The Startup Powering The Data Behind AGI

Kokeile Premiumia

Jaksot(128)

Jordan Fisher — Skipping the Line with Autonomous Checkout

Drago Anguelov — Robustness, Safety, and Scalability at Waymo

James Cham — Investing in the Intersection of Business and Technology

Boris Dayma — The Story Behind DALL·E mini, the Viral Phenomenon

Tristan Handy — The Work Behind the Data Work

Johannes Otterbach — Unlocking ML for Traditional Companies

Mircea Neagovici — Robotic Process Automation (RPA) and ML

Jensen Huang — NVIDIA’s CEO on the Next Generation of AI and MLOps

Kaikki yhdessä sovelluksessa

Sinulle valikoitua sisältöä

Jatka kuuntelua koska tahansa

Premium

Premium

Suosittua kategoriassa Liike-elämä ja talous

Tarinat ja äänet, joita rakastat kuunnella