Gradient Dissent: Conversations on AI16 Syys 2025

The Startup Powering The Data Behind AGI

In this episode of Gradient Dissent, Lukas Biewald talks with the CEO & founder of Surge AI, the billion-dollar company quietly powering the next generation of frontier LLMs. They discuss Surge's origin story, why traditional data labeling is broken, and how their research-focused approach is reshaping how models are trained.

You’ll hear why inter-annotator agreement fails in high-complexity tasks like poetry and math, why synthetic data is often overrated, and how Surge builds rich RL environments to stress-test agentic reasoning. They also go deep on what kinds of data will be critical to future progress in AI—from scientific discovery to multimodal reasoning and personalized alignment.

It’s a rare, behind-the-scenes look into the world of high-quality data generation at scale—straight from the team most frontier labs trust to get it right.

Timestamps:

00:00 – Intro: Who is Edwin Chen?

03:40 – The problem with early data labeling systems

06:20 – Search ranking, clickbait, and product principles

10:05 – Why Surge focused on high-skill, high-quality labeling

13:50 – From Craigslist workers to a billion-dollar business

16:40 – Scaling without funding and avoiding Silicon Valley status games

21:15 – Why most human data platforms lack real tech

25:05 – Detecting cheaters, liars, and low-quality labelers

28:30 – Why inter-annotator agreement is a flawed metric

32:15 – What makes a great poem? Not checkboxes

36:40 – Measuring subjective quality rigorously

40:00 – What types of data are becoming more important

44:15 – Scientific collaboration and frontier research data

47:00 – Multimodal data, Argentinian coding, and hyper-specificity

50:10 – What's wrong with LMSYS and benchmark hacking

53:20 – Personalization and taste in model behavior

56:00 – Synthetic data vs. high-quality human data

Follow Weights & Biases:

https://twitter.com/weights_biases

https://www.linkedin.com/company/wandb

Tämä jakso on lisätty Podme-palveluun avoimen RSS-syötteen kautta eikä se ole Podmen omaa tuotantoa. Siksi jakso saattaa sisältää mainontaa.

Jaksot(136)

Hamel Husain — Building Machine Learning Tools

Hamel Husain is a Staff Machine Learning Engineer at Github. He has extensive experience building data analytics and predictive modeling solutions for a wide range of industries, including: hospitalit...

24 Kesä 202036min

Peter Welinder — Deep Reinforcement Learning and Robotics

Peter Welinder is a research scientist and roboticist at OpenAI. Before that, he was an engineer at Dropbox and ran the machine learning team, and before that, he co-founded Anchovi Labs a startup usi...

17 Kesä 202054min

Vicki Boykis — Machine Learning Across Industries

👩‍💻Today our guest is Vicki Boykis!Vicki is a senior consultant in machine learning and engineering and works with clients to build holistic data products used for decision-making. She's previously ...

4 Kesä 202034min

Angela & Danielle — Designing ML Models for Millions of Consumer Robots

👩‍💻👩‍💻On this episode of Gradient Dissent our guests are Angela Bassa and Danielle Dean!Angela is an expert in building and leading data teams. An MIT-trained and Edelman-award-winning mathematici...

6 Touko 202052min

Jack Clark — Building Trustworthy AI Systems

Jack Clark is the Strategy and Communications Director at OpenAI and formerly worked as the world’s only neural network reporter at Bloomberg. Lukas and Jack discuss AI policy, ethics, and the respons...

22 Huhti 202055min

Rachael Tatman — Conversational AI and Linguistics

🏅 See how W&B is your secret weapon to make it onto the Kaggle leaderboards - https://www.wandb.com/kaggle👩‍💻Rachael Tatman is a developer advocate for Rasa, where she helps developers build and de...

7 Huhti 202036min

Nicolas Koumchatzky — Machine Learning in Production for Self-Driving Cars

👨🏻‍💻Nicolas Koumchatzky is the Director of AI infrastructure at NVIDIA, where he's responsible for MagLev, the production-grade machine learning platform by NVIDIA. His team supports diverse ML use...

21 Maalis 202044min

Brandon Rohrer — Machine Learning in Production for Robots

👨🏻‍💻Brandon Rohrer is a Mechanical Engineer turned Data Scientist. He’s currently a Principal Data Scientist at iRobot and has an incredibly popular Machine Learning course at e2eML where he’s made...

11 Maalis 202034min