The Startup Powering The Data Behind AGI

The Startup Powering The Data Behind AGI

In this episode of Gradient Dissent, Lukas Biewald talks with the CEO & founder of Surge AI, the billion-dollar company quietly powering the next generation of frontier LLMs. They discuss Surge's origin story, why traditional data labeling is broken, and how their research-focused approach is reshaping how models are trained.

You’ll hear why inter-annotator agreement fails in high-complexity tasks like poetry and math, why synthetic data is often overrated, and how Surge builds rich RL environments to stress-test agentic reasoning. They also go deep on what kinds of data will be critical to future progress in AI—from scientific discovery to multimodal reasoning and personalized alignment.

It’s a rare, behind-the-scenes look into the world of high-quality data generation at scale—straight from the team most frontier labs trust to get it right.


Timestamps:

00:00 – Intro: Who is Edwin Chen?

03:40 – The problem with early data labeling systems

06:20 – Search ranking, clickbait, and product principles

10:05 – Why Surge focused on high-skill, high-quality labeling

13:50 – From Craigslist workers to a billion-dollar business

16:40 – Scaling without funding and avoiding Silicon Valley status games

21:15 – Why most human data platforms lack real tech

25:05 – Detecting cheaters, liars, and low-quality labelers

28:30 – Why inter-annotator agreement is a flawed metric

32:15 – What makes a great poem? Not checkboxes

36:40 – Measuring subjective quality rigorously

40:00 – What types of data are becoming more important

44:15 – Scientific collaboration and frontier research data

47:00 – Multimodal data, Argentinian coding, and hyper-specificity

50:10 – What's wrong with LMSYS and benchmark hacking

53:20 – Personalization and taste in model behavior

56:00 – Synthetic data vs. high-quality human data


Follow Weights & Biases:

https://twitter.com/weights_biases

https://www.linkedin.com/company/wandb

Tämä jakso on lisätty Podme-palveluun avoimen RSS-syötteen kautta eikä se ole Podmen omaa tuotantoa. Siksi jakso saattaa sisältää mainontaa.

Jaksot(136)

Sarah Catanzaro — Remembering the Lessons of the Last AI Renaissance

Sarah Catanzaro — Remembering the Lessons of the Last AI Renaissance

Sarah Catanzaro is a General Partner at Amplify Partners, and one of the leading investors in AI and ML. Her investments include RunwayML, OctoML, and Gantry.Sarah and Lukas discuss lessons learned fr...

2 Helmi 20231h 16min

Cristóbal Valenzuela — The Next Generation of Content Creation and AI

Cristóbal Valenzuela — The Next Generation of Content Creation and AI

Cristóbal Valenzuela is co-founder and CEO of Runway ML, a startup that's building the future of AI-powered content creation tools. Runway's research areas include diffusion systems for image generati...

19 Tammi 202340min

Jeremy Howard — The Simple but Profound Insight Behind Diffusion

Jeremy Howard — The Simple but Profound Insight Behind Diffusion

Jeremy Howard is a co-founder of fast.ai, the non-profit research group behind the popular massive open online course "Practical Deep Learning for Coders", and the open source deep learning library "f...

5 Tammi 20231h 12min

Jerome Pesenti — Large Language Models, PyTorch, and Meta

Jerome Pesenti — Large Language Models, PyTorch, and Meta

Jerome Pesenti is the former VP of AI at Meta, a tech conglomerate that includes Facebook, WhatsApp, and Instagram, and one of the most exciting places where AI research is happening today.Jerome shar...

22 Joulu 202252min

D. Sculley — Technical Debt, Trade-offs, and Kaggle

D. Sculley — Technical Debt, Trade-offs, and Kaggle

D. Sculley is CEO of Kaggle, the beloved and well-known data science and machine learning community.D. discusses his influential 2015 paper "Machine Learning: The High Interest Credit Card of Technica...

1 Joulu 20221h

Emad Mostaque — Stable Diffusion, Stability AI, and What’s Next

Emad Mostaque — Stable Diffusion, Stability AI, and What’s Next

Emad Mostaque is CEO and co-founder of Stability AI, a startup and network of decentralized developer communities building open AI tools. Stability AI is the company behind Stable Diffusion, the well-...

15 Marras 20221h 10min

Jehan Wickramasuriya — AI in High-Stress Scenarios

Jehan Wickramasuriya — AI in High-Stress Scenarios

Jehan Wickramasuriya is the Vice President of AI, Platform & Data Services at Motorola Solutions, a global leader in public safety and enterprise security.In this episode, Jehan discusses how Motorola...

6 Loka 20221h

Will Falcon — Making Lightning the Apple of ML

Will Falcon — Making Lightning the Apple of ML

Will Falcon is the CEO and co-founder of Lightning AI, a platform that enables users to quickly build and publish ML models.In this episode, Will explains how Lightning addresses the challenges of a f...

15 Syys 202245min

Suosittua kategoriassa Liike-elämä ja talous

sijotuskasti
psykopodiaa-podcast
rss-rahapodi
rss-oivalluksia-rahasta-elamasta
mimmit-sijoittaa
rss-rahamania
rss-startup-ministerio
rss-sami-miettinen-neuvottelija
hyva-paha-johtaminen
asuntoasiaa-paivakirjat
ostan-asuntoja-podcast
rahapuhetta
pomojen-suusta
sijoituspodi
juristipodi
rss-uskalla-yrittaa
rss-lahtijat
rss-bisnesta-bebeja
rss-karon-grilli
rss-seuraava-potilas