The Startup Powering The Data Behind AGI

The Startup Powering The Data Behind AGI

In this episode of Gradient Dissent, Lukas Biewald talks with the CEO & founder of Surge AI, the billion-dollar company quietly powering the next generation of frontier LLMs. They discuss Surge's origin story, why traditional data labeling is broken, and how their research-focused approach is reshaping how models are trained.

You’ll hear why inter-annotator agreement fails in high-complexity tasks like poetry and math, why synthetic data is often overrated, and how Surge builds rich RL environments to stress-test agentic reasoning. They also go deep on what kinds of data will be critical to future progress in AI—from scientific discovery to multimodal reasoning and personalized alignment.

It’s a rare, behind-the-scenes look into the world of high-quality data generation at scale—straight from the team most frontier labs trust to get it right.


Timestamps:

00:00 – Intro: Who is Edwin Chen?

03:40 – The problem with early data labeling systems

06:20 – Search ranking, clickbait, and product principles

10:05 – Why Surge focused on high-skill, high-quality labeling

13:50 – From Craigslist workers to a billion-dollar business

16:40 – Scaling without funding and avoiding Silicon Valley status games

21:15 – Why most human data platforms lack real tech

25:05 – Detecting cheaters, liars, and low-quality labelers

28:30 – Why inter-annotator agreement is a flawed metric

32:15 – What makes a great poem? Not checkboxes

36:40 – Measuring subjective quality rigorously

40:00 – What types of data are becoming more important

44:15 – Scientific collaboration and frontier research data

47:00 – Multimodal data, Argentinian coding, and hyper-specificity

50:10 – What's wrong with LMSYS and benchmark hacking

53:20 – Personalization and taste in model behavior

56:00 – Synthetic data vs. high-quality human data


Follow Weights & Biases:

https://twitter.com/weights_biases

https://www.linkedin.com/company/wandb

Tämä jakso on lisätty Podme-palveluun avoimen RSS-syötteen kautta eikä se ole Podmen omaa tuotantoa. Siksi jakso saattaa sisältää mainontaa.

Jaksot(136)

Pete Warden — Practical Applications of TinyML

Pete Warden — Practical Applications of TinyML

Pete is the Technical Lead of the TensorFlow Micro team, which works on deep learning for mobile and embedded devices.Lukas and Pete talk about hacking a Raspberry Pi to run AlexNet, the power and siz...

21 Loka 202153min

Pieter Abbeel — Robotics, Startups, and Robotics Startups

Pieter Abbeel — Robotics, Startups, and Robotics Startups

Pieter is the Chief Scientist and Co-founder at Covariant, where his team is building universal AI for robotic manipulation. Pieter also hosts The Robot Brains Podcast, in which he explores how far hu...

7 Loka 202157min

Chris Albon — ML Models and Infrastructure at Wikimedia

Chris Albon — ML Models and Infrastructure at Wikimedia

In this episode we're joined by Chris Albon, Director of Machine Learning at the Wikimedia Foundation.Lukas and Chris talk about Wikimedia's approach to content moderation, what it's like to work in a...

23 Syys 202156min

Emily M. Bender — Language Models and Linguistics

Emily M. Bender — Language Models and Linguistics

In this episode, Emily and Lukas dive into the problems with bigger and bigger language models, the difference between form and meaning, the limits of benchmarks, and why it's important to name the la...

9 Syys 20211h 12min

Jeff Hammerbacher — From data science to biomedicine

Jeff Hammerbacher — From data science to biomedicine

Jeff talks about building Facebook's early data team, founding Cloudera, and transitioning into biomedicine with Hammer Lab and Related Sciences.(Read more: http://wandb.me/gd-jeff-hammerbacher)---Jef...

26 Elo 202156min

Josh Bloom — The Link Between Astronomy and ML

Josh Bloom — The Link Between Astronomy and ML

Josh explains how astronomy and machine learning have informed each other, their current limitations, and where their intersection goes from here. (Read more: http://wandb.me/gd-josh-bloom)---Josh is ...

20 Elo 20211h 8min

Xavier Amatriain — Building AI-powered Primary Care

Xavier Amatriain — Building AI-powered Primary Care

Xavier shares his experience deploying healthcare models, augmenting primary care with AI, the challenges of "ground truth" in medicine, and robustness in ML.---Xavier Amatriain is co-founder and CTO ...

30 Heinä 202150min

Spence Green — Enterprise-scale Machine Translation

Spence Green — Enterprise-scale Machine Translation

Spence shares his experience creating a product around human-in-the-loop machine translation, and explains how machine translation has evolved over the years.---Spence Green is co-founder and CEO of L...

16 Heinä 202143min

Suosittua kategoriassa Liike-elämä ja talous

sijotuskasti
psykopodiaa-podcast
rss-rahapodi
rss-oivalluksia-rahasta-elamasta
mimmit-sijoittaa
rss-rahamania
rss-startup-ministerio
rss-sami-miettinen-neuvottelija
hyva-paha-johtaminen
asuntoasiaa-paivakirjat
ostan-asuntoja-podcast
rahapuhetta
pomojen-suusta
sijoituspodi
juristipodi
rss-uskalla-yrittaa
rss-lahtijat
rss-bisnesta-bebeja
rss-karon-grilli
rss-seuraava-potilas