The Startup Powering The Data Behind AGI

The Startup Powering The Data Behind AGI

In this episode of Gradient Dissent, Lukas Biewald talks with the CEO & founder of Surge AI, the billion-dollar company quietly powering the next generation of frontier LLMs. They discuss Surge's origin story, why traditional data labeling is broken, and how their research-focused approach is reshaping how models are trained.

You’ll hear why inter-annotator agreement fails in high-complexity tasks like poetry and math, why synthetic data is often overrated, and how Surge builds rich RL environments to stress-test agentic reasoning. They also go deep on what kinds of data will be critical to future progress in AI—from scientific discovery to multimodal reasoning and personalized alignment.

It’s a rare, behind-the-scenes look into the world of high-quality data generation at scale—straight from the team most frontier labs trust to get it right.


Timestamps:

00:00 – Intro: Who is Edwin Chen?

03:40 – The problem with early data labeling systems

06:20 – Search ranking, clickbait, and product principles

10:05 – Why Surge focused on high-skill, high-quality labeling

13:50 – From Craigslist workers to a billion-dollar business

16:40 – Scaling without funding and avoiding Silicon Valley status games

21:15 – Why most human data platforms lack real tech

25:05 – Detecting cheaters, liars, and low-quality labelers

28:30 – Why inter-annotator agreement is a flawed metric

32:15 – What makes a great poem? Not checkboxes

36:40 – Measuring subjective quality rigorously

40:00 – What types of data are becoming more important

44:15 – Scientific collaboration and frontier research data

47:00 – Multimodal data, Argentinian coding, and hyper-specificity

50:10 – What's wrong with LMSYS and benchmark hacking

53:20 – Personalization and taste in model behavior

56:00 – Synthetic data vs. high-quality human data


Follow Weights & Biases:

https://twitter.com/weights_biases

https://www.linkedin.com/company/wandb

Denne episoden er hentet fra en åpen RSS-feed og er ikke publisert av Podme. Den kan derfor inneholde annonser.

Episoder(136)

Sarah Catanzaro — Remembering the Lessons of the Last AI Renaissance

Sarah Catanzaro — Remembering the Lessons of the Last AI Renaissance

Sarah Catanzaro is a General Partner at Amplify Partners, and one of the leading investors in AI and ML. Her investments include RunwayML, OctoML, and Gantry.Sarah and Lukas discuss lessons learned fr...

2 Feb 20231h 16min

Cristóbal Valenzuela — The Next Generation of Content Creation and AI

Cristóbal Valenzuela — The Next Generation of Content Creation and AI

Cristóbal Valenzuela is co-founder and CEO of Runway ML, a startup that's building the future of AI-powered content creation tools. Runway's research areas include diffusion systems for image generati...

19 Jan 202340min

Jeremy Howard — The Simple but Profound Insight Behind Diffusion

Jeremy Howard — The Simple but Profound Insight Behind Diffusion

Jeremy Howard is a co-founder of fast.ai, the non-profit research group behind the popular massive open online course "Practical Deep Learning for Coders", and the open source deep learning library "f...

5 Jan 20231h 12min

Jerome Pesenti — Large Language Models, PyTorch, and Meta

Jerome Pesenti — Large Language Models, PyTorch, and Meta

Jerome Pesenti is the former VP of AI at Meta, a tech conglomerate that includes Facebook, WhatsApp, and Instagram, and one of the most exciting places where AI research is happening today.Jerome shar...

22 Des 202252min

D. Sculley — Technical Debt, Trade-offs, and Kaggle

D. Sculley — Technical Debt, Trade-offs, and Kaggle

D. Sculley is CEO of Kaggle, the beloved and well-known data science and machine learning community.D. discusses his influential 2015 paper "Machine Learning: The High Interest Credit Card of Technica...

1 Des 20221h

Emad Mostaque — Stable Diffusion, Stability AI, and What’s Next

Emad Mostaque — Stable Diffusion, Stability AI, and What’s Next

Emad Mostaque is CEO and co-founder of Stability AI, a startup and network of decentralized developer communities building open AI tools. Stability AI is the company behind Stable Diffusion, the well-...

15 Nov 20221h 10min

Jehan Wickramasuriya — AI in High-Stress Scenarios

Jehan Wickramasuriya — AI in High-Stress Scenarios

Jehan Wickramasuriya is the Vice President of AI, Platform & Data Services at Motorola Solutions, a global leader in public safety and enterprise security.In this episode, Jehan discusses how Motorola...

6 Okt 20221h

Will Falcon — Making Lightning the Apple of ML

Will Falcon — Making Lightning the Apple of ML

Will Falcon is the CEO and co-founder of Lightning AI, a platform that enables users to quickly build and publish ML models.In this episode, Will explains how Lightning addresses the challenges of a f...

15 Sep 202245min

Populært innen Business og økonomi

stopp-verden
lydartikler-fra-aftenposten
dine-penger-pengeradet
rss-penger-polser-og-politikk
e24-podden
rss-borsmorgen-okonominyhetene
rss-skravla-gar
livet-pa-veien-med-jan-erik-larssen
finansredaksjonen
rss-pa-konto
pengesnakk
pengepodden-2
tid-er-penger-en-podcast-med-peter-warren
utbytte
morgenkaffen-med-finansavisen
rss-markedspuls-2
liberal-halvtime
lederpodden
rss-sunn-okonomi
okonomiamatorene