The Startup Powering The Data Behind AGI

The Startup Powering The Data Behind AGI

In this episode of Gradient Dissent, Lukas Biewald talks with the CEO & founder of Surge AI, the billion-dollar company quietly powering the next generation of frontier LLMs. They discuss Surge's origin story, why traditional data labeling is broken, and how their research-focused approach is reshaping how models are trained.

You’ll hear why inter-annotator agreement fails in high-complexity tasks like poetry and math, why synthetic data is often overrated, and how Surge builds rich RL environments to stress-test agentic reasoning. They also go deep on what kinds of data will be critical to future progress in AI—from scientific discovery to multimodal reasoning and personalized alignment.

It’s a rare, behind-the-scenes look into the world of high-quality data generation at scale—straight from the team most frontier labs trust to get it right.


Timestamps:

00:00 – Intro: Who is Edwin Chen?

03:40 – The problem with early data labeling systems

06:20 – Search ranking, clickbait, and product principles

10:05 – Why Surge focused on high-skill, high-quality labeling

13:50 – From Craigslist workers to a billion-dollar business

16:40 – Scaling without funding and avoiding Silicon Valley status games

21:15 – Why most human data platforms lack real tech

25:05 – Detecting cheaters, liars, and low-quality labelers

28:30 – Why inter-annotator agreement is a flawed metric

32:15 – What makes a great poem? Not checkboxes

36:40 – Measuring subjective quality rigorously

40:00 – What types of data are becoming more important

44:15 – Scientific collaboration and frontier research data

47:00 – Multimodal data, Argentinian coding, and hyper-specificity

50:10 – What's wrong with LMSYS and benchmark hacking

53:20 – Personalization and taste in model behavior

56:00 – Synthetic data vs. high-quality human data


Follow Weights & Biases:

https://twitter.com/weights_biases

https://www.linkedin.com/company/wandb

Det här avsnittet är hämtat från ett öppet RSS-flöde och publiceras inte av Podme. Det kan innehålla reklam.

Avsnitt(136)

Advanced AI Accelerators and Processors with Andrew Feldman of Cerebras Systems

Advanced AI Accelerators and Processors with Andrew Feldman of Cerebras Systems

On this episode, we’re joined by Andrew Feldman, Founder and CEO of Cerebras Systems. Andrew and the Cerebras team are responsible for building the largest-ever computer chip and the fastest AI-specif...

22 Juni 20231h

Enabling LLM-Powered Applications with Harrison Chase of LangChain

Enabling LLM-Powered Applications with Harrison Chase of LangChain

On this episode, we’re joined by Harrison Chase, Co-Founder and CEO of LangChain. Harrison and his team at LangChain are on a mission to make the process of creating applications powered by LLMs as ea...

1 Juni 202351min

Deploying Autonomous Mobile Robots with Jean Marc Alkazzi at idealworks

Deploying Autonomous Mobile Robots with Jean Marc Alkazzi at idealworks

On this episode, we’re joined by Jean Marc Alkazzi, Applied AI at idealworks. Jean focuses his attention on applied AI, leveraging the use of autonomous mobile robots (AMRs) to improve efficiency with...

18 Maj 202358min

How EleutherAI Trains and Releases LLMs: Interview with Stella Biderman

How EleutherAI Trains and Releases LLMs: Interview with Stella Biderman

On this episode, we’re joined by Stella Biderman, Executive Director at EleutherAI and Lead Scientist - Mathematician at Booz Allen Hamilton.EleutherAI is a grassroots collective that enables open-sou...

4 Maj 202357min

Scaling LLMs and Accelerating Adoption with Aidan Gomez at Cohere

Scaling LLMs and Accelerating Adoption with Aidan Gomez at Cohere

On this episode, we’re joined by Aidan Gomez, Co-Founder and CEO at Cohere. Cohere develops and releases a range of innovative AI-powered tools and solutions for a variety of NLP use cases.We discuss:...

20 Apr 202351min

Neural Network Pruning and Training with Jonathan Frankle at MosaicML

Neural Network Pruning and Training with Jonathan Frankle at MosaicML

Jonathan Frankle, Chief Scientist at MosaicML and Assistant Professor of Computer Science at Harvard University, joins us on this episode. With comprehensive infrastructure and software tools, MosaicM...

4 Apr 20231h 2min

Jasper AI's Dave Rogenmoser & Saad Ansari on Growing & Maintaining an LLM-Based Company

Jasper AI's Dave Rogenmoser & Saad Ansari on Growing & Maintaining an LLM-Based Company

About this episodeIn this episode of Gradient Dissent, Lukas interviews Dave Rogenmoser (CEO & Co-Founder) and Saad Ansari (Director of AI) of Jasper AI, a generative AI company with a focus on text g...

16 Mars 20231h 9min

Shreya Shankar — Operationalizing Machine Learning

Shreya Shankar — Operationalizing Machine Learning

About This EpisodeShreya Shankar is a computer scientist, PhD student in databases at UC Berkeley, and co-author of "Operationalizing Machine Learning: An Interview Study", an ethnographic interview s...

3 Mars 202354min

Populärt inom Business & ekonomi

framgangspodden
varvet
rss-jossan-nina
badfluence
rss-svart-marknad
rss-borsens-finest
svd-tech-brief
avanzapodden
uppgang-och-fall
bathina-en-podcast
fill-or-kill
kapitalet-en-podd-om-ekonomi
dynastin
tabberaset
rss-dagen-med-di
lastbilspodden
rss-inga-dumma-fragor-om-pengar
rikatillsammans-om-privatekonomi-rikedom-i-livet
rss-kort-lang-analyspodden-fran-di
rss-dr-bjorklund