The Startup Powering The Data Behind AGI

The Startup Powering The Data Behind AGI

In this episode of Gradient Dissent, Lukas Biewald talks with the CEO & founder of Surge AI, the billion-dollar company quietly powering the next generation of frontier LLMs. They discuss Surge's origin story, why traditional data labeling is broken, and how their research-focused approach is reshaping how models are trained.

You’ll hear why inter-annotator agreement fails in high-complexity tasks like poetry and math, why synthetic data is often overrated, and how Surge builds rich RL environments to stress-test agentic reasoning. They also go deep on what kinds of data will be critical to future progress in AI—from scientific discovery to multimodal reasoning and personalized alignment.

It’s a rare, behind-the-scenes look into the world of high-quality data generation at scale—straight from the team most frontier labs trust to get it right.


Timestamps:

00:00 – Intro: Who is Edwin Chen?

03:40 – The problem with early data labeling systems

06:20 – Search ranking, clickbait, and product principles

10:05 – Why Surge focused on high-skill, high-quality labeling

13:50 – From Craigslist workers to a billion-dollar business

16:40 – Scaling without funding and avoiding Silicon Valley status games

21:15 – Why most human data platforms lack real tech

25:05 – Detecting cheaters, liars, and low-quality labelers

28:30 – Why inter-annotator agreement is a flawed metric

32:15 – What makes a great poem? Not checkboxes

36:40 – Measuring subjective quality rigorously

40:00 – What types of data are becoming more important

44:15 – Scientific collaboration and frontier research data

47:00 – Multimodal data, Argentinian coding, and hyper-specificity

50:10 – What's wrong with LMSYS and benchmark hacking

53:20 – Personalization and taste in model behavior

56:00 – Synthetic data vs. high-quality human data


Follow Weights & Biases:

https://twitter.com/weights_biases

https://www.linkedin.com/company/wandb

Tämä jakso on lisätty Podme-palveluun avoimen RSS-syötteen kautta eikä se ole Podmen omaa tuotantoa. Siksi jakso saattaa sisältää mainontaa.

Jaksot(136)

Shaping the World of Robotics with Chelsea Finn

Shaping the World of Robotics with Chelsea Finn

In the newest episode of Gradient Dissent, Chelsea Finn, Assistant Professor at Stanford's Computer Science Department, discusses the forefront of robotics and machine learning.Discover her groundbrea...

15 Helmi 202453min

The Power of AI in Search with You.com's Richard Socher

The Power of AI in Search with You.com's Richard Socher

In the latest episode of Gradient Dissent, Richard Socher, CEO of You.com, shares his insights on the power of AI in search. The episode focuses on how advanced language models like GPT-4 are transfor...

1 Helmi 20241h 8min

AI’s Future: Investment & Impact with Sarah Guo and Elad Gil

AI’s Future: Investment & Impact with Sarah Guo and Elad Gil

Explore the Future of Investment & Impact in AI with Host Lukas Biewald and Guests Elad Gill and Sarah Guo of the No Priors podcast.Sarah is the founder of Conviction VC, an AI-centric $100 million ve...

18 Tammi 20241h 4min

Revolutionizing AI Data Management with Jerry Liu, CEO of LlamaIndex

Revolutionizing AI Data Management with Jerry Liu, CEO of LlamaIndex

In the latest episode of Gradient Dissent, we explore the innovative features and impact of LlamaIndex in AI data management with Jerry Liu, CEO of LlamaIndex. Jerry shares insights on how LlamaIndex ...

4 Tammi 202457min

Bridging AI and Science: The Impact of Machine Learning on Material Innovation with Joe Spisak of Meta

Bridging AI and Science: The Impact of Machine Learning on Material Innovation with Joe Spisak of Meta

In the latest episode of Gradient Dissent, we hear from Joseph Spisak, Product Director, Generative AI @Meta, to explore the boundless impacts of AI and its expansive role in reshaping various sectors...

7 Joulu 20231h 14min

Unlocking the Power of Language Models in Enterprise: A Deep Dive with Chris Van Pelt

Unlocking the Power of Language Models in Enterprise: A Deep Dive with Chris Van Pelt

In the premiere episode of Gradient Dissent Business, we're joined by Weights & Biases co-founder Chris Van Pelt for a deep dive into the world of large language models like GPT-3.5 and GPT-4. Chris b...

16 Marras 202352min

Providing Greater Access to LLMs with Brandon Duderstadt, Co-Founder and CEO of Nomic AI

Providing Greater Access to LLMs with Brandon Duderstadt, Co-Founder and CEO of Nomic AI

On this episode, we’re joined by Brandon Duderstadt, Co-Founder and CEO of Nomic AI. Both of Nomic AI’s products, Atlas and GPT4All, aim to improve the explainability and accessibility of AI.We discus...

27 Heinä 20231h 1min

Exploring PyTorch and Open-Source Communities with Soumith Chintala, VP/Fellow of Meta, Co-Creator of PyTorch

Exploring PyTorch and Open-Source Communities with Soumith Chintala, VP/Fellow of Meta, Co-Creator of PyTorch

On this episode, we’re joined by Soumith Chintala, VP/Fellow of Meta and Co-Creator of PyTorch. Soumith and his colleagues’ open-source framework impacted both the development process and the end-user...

13 Heinä 20231h 8min

Suosittua kategoriassa Liike-elämä ja talous

sijotuskasti
psykopodiaa-podcast
rss-rahapodi
rss-oivalluksia-rahasta-elamasta
mimmit-sijoittaa
rss-rahamania
rss-startup-ministerio
rss-sami-miettinen-neuvottelija
hyva-paha-johtaminen
asuntoasiaa-paivakirjat
ostan-asuntoja-podcast
rahapuhetta
pomojen-suusta
sijoituspodi
juristipodi
rss-uskalla-yrittaa
rss-lahtijat
rss-bisnesta-bebeja
rss-karon-grilli
rss-seuraava-potilas