The Startup Powering The Data Behind AGI

The Startup Powering The Data Behind AGI

In this episode of Gradient Dissent, Lukas Biewald talks with the CEO & founder of Surge AI, the billion-dollar company quietly powering the next generation of frontier LLMs. They discuss Surge's origin story, why traditional data labeling is broken, and how their research-focused approach is reshaping how models are trained.

You’ll hear why inter-annotator agreement fails in high-complexity tasks like poetry and math, why synthetic data is often overrated, and how Surge builds rich RL environments to stress-test agentic reasoning. They also go deep on what kinds of data will be critical to future progress in AI—from scientific discovery to multimodal reasoning and personalized alignment.


It’s a rare, behind-the-scenes look into the world of high-quality data generation at scale—straight from the team most frontier labs trust to get it right.


Timestamps:

00:00 – Intro: Who is Edwin Chen?

03:40 – The problem with early data labeling systems

06:20 – Search ranking, clickbait, and product principles

10:05 – Why Surge focused on high-skill, high-quality labeling

13:50 – From Craigslist workers to a billion-dollar business

16:40 – Scaling without funding and avoiding Silicon Valley status games

21:15 – Why most human data platforms lack real tech

25:05 – Detecting cheaters, liars, and low-quality labelers

28:30 – Why inter-annotator agreement is a flawed metric

32:15 – What makes a great poem? Not checkboxes

36:40 – Measuring subjective quality rigorously

40:00 – What types of data are becoming more important

44:15 – Scientific collaboration and frontier research data

47:00 – Multimodal data, Argentinian coding, and hyper-specificity

50:10 – What's wrong with LMSYS and benchmark hacking

53:20 – Personalization and taste in model behavior

56:00 – Synthetic data vs. high-quality human data


Follow Weights & Biases:

https://twitter.com/weights_biases

https://www.linkedin.com/company/wandb

Avsnitt(131)

Rachael Tatman — Conversational AI and Linguistics

Rachael Tatman — Conversational AI and Linguistics

🏅 See how W&B is your secret weapon to make it onto the Kaggle leaderboards - https://www.wandb.com/kaggle 👩‍💻Rachael Tatman is a developer advocate for Rasa, where she helps developers build and deploy conversational AI applications using their open source framework. 🤖💬 She has a PhD in Linguistics from the University of Washington where she researched computational sociolinguistics, or how our social identity affects the way we use language in computational contexts. Previously she was a data scientist at Kaggle where she’s still a Grandmaster. 💻Keep up with Rachael on her website: http://www.rctatman.com/ 🐦Follow Rachael on twitter: https://twitter.com/rctatman Get our podcast on Apple and Spotify! https://podcasts.apple.com/us/podcast/gradient-dissent-weights-biases/id1504567418 https://open.spotify.com/show/7o9r3fFig3MhTJwehXDbXm 🤖Gradient Dissent by Weights and Biases We started Weights and Biases to build tools for Machine Learning practitioners because we care a lot about the impact that Machine Learning can have in the world and we love working in the trenches with the people building these models. One of the most fun things about these building tools has been the conversations with these ML practitioners and learning about the interesting things they’re working on. This process has been so fun that we wanted to open it up to the world in the form of our new podcast. We hope you have as much fun listening to it as we had making it. 👩🏼‍🚀Weights and Biases: We’re always free for academics and open source projects. Email carey@wandb.com with any questions or feature suggestions. - Blog: https://www.wandb.com/articles - Gallery: See what you can create with W&B - https://app.wandb.ai/gallery - Continue the conversation on our slack community - http://bit.ly/wandb-forum 🎙Host: Lukas Biewald - https://twitter.com/l2k 👩🏼‍💻Producer: Lavanya Shukla - https://twitter.com/lavanyaai 📹Editor: Cayla Sharp - http://caylasharp.com/

7 Apr 202036min

Nicolas Koumchatzky — Machine Learning in Production for Self-Driving Cars

Nicolas Koumchatzky — Machine Learning in Production for Self-Driving Cars

👨🏻‍💻Nicolas Koumchatzky is the Director of AI infrastructure at NVIDIA, where he's responsible for MagLev, the production-grade machine learning platform by NVIDIA. His team supports diverse ML use cases: autonomous vehicles, medical imaging, super resolution, predictive analytics, cyber security, robotics. He started as a Quant in Paris, then joined Madbits, a startup specialized on using deep learning for content understanding. When Madbits was acquired by Twitter in 2014, he joined as a deep learning expert and led a few projects in Cortex, include a real-time live video classification product for Periscope. In 2016, he focused on building an scalable AI platform for the company. Early 2017, he became the lead for the Cortex team. He joined NVIDIA in 2018. 🐦Follow Nicolas on twitter: https://twitter.com/nkoumchatzky 🛠Maglev: https://blogs.nvidia.com/blog/2018/09/13/how-maglev-speeds-autonomous-vehicles-to-superhuman-levels-of-safety/ ✍️Scalable Active Learning for Autonomous Driving: https://medium.com/nvidia-ai/scalable-active-learning-for-autonomous-driving-a-practical-implementation-and-a-b-test-4d315ed04b5f ✍️Active Learning – Finding the right self-driving training data doesn’t have to take a swarm of human labelers: https://blogs.nvidia.com/blog/2020/01/16/what-is-active-learning/ 👫Continue the conversation on our slack community - http://bit.ly/wandb-forum 🤖Gradient Dissent by Weights and Biases We started Weights and Biases to build tools for Machine Learning practitioners because we care a lot about the impact that Machine Learning can have in the world and we love working in the trenches with the people building these models. One of the most fun things about these building tools has been the conversations with these ML practitioners and learning about the interesting things they’re working on. This process has been so fun that we wanted to open it up to the world in the form of our new podcast. We hope you have as much fun listening to it as we had making it. 👩🏼‍🚀Weights and Biases: We’re always free for academics and open source projects. Email carey@wandb.com with any questions or feature suggestions. * Visualize your Scikit model performance with W&B - https://app.wandb.ai/lavanyashukla/visualize-sklearn/reports/Visualizing-Sklearn-With-Weights-and-Biases--Vmlldzo0ODIzNg * Blog: https://www.wandb.com/articles * Gallery: See what you can create with W&B - https://app.wandb.ai/gallery 🎙Host: Lukas Biewald - https://twitter.com/l2k 👩🏼‍💻Producer: Lavanya Shukla - https://twitter.com/lavanyaai 📹Editor: Cayla Sharp - http://caylasharp.com/

21 Mars 202044min

Brandon Rohrer — Machine Learning in Production for Robots

Brandon Rohrer — Machine Learning in Production for Robots

👨🏻‍💻Brandon Rohrer is a Mechanical Engineer turned Data Scientist. He’s currently a Principal Data Scientist at iRobot and has an incredibly popular Machine Learning course at e2eML where he’s made some wildly popular videos on convolutional neural networks and deep learning. His fascination with robots began after watching Luke Skywalker’s prosthetic hand in the Empire Strikes Back. He turned this fascination into a PhD from MIT and subsequently found his way to building some incredible data science products at Facebook, Microsoft and now at iRobot. ✍️Brandon’s brilliant machine learning course: http://e2eml.school/ 🐦Follow Brandon on twitter: https://twitter.com/_brohrer_ 👫Continue the conversation on our slack community - http://bit.ly/wandb-forum 🤖Gradient Dissent by Weights and Biases - http://wandb.com We started Weights and Biases to build tools for Machine Learning practitioners because we care a lot about the impact that Machine Learning can have in the world and we love working in the trenches with the people building these models. One of the most fun things about these building tools has been the conversations with these ML practitioners and learning about the interesting things they’re working on. This process has been so fun that we wanted to open it up to the world in the form of our new podcast. We hope you have as much fun listening to it as we had making it. Today our guest is Brandon Rohrer. 👩🏼‍🚀Weights and Biases: We’re always free for academics and open source projects. Email carey@wandb.com with any questions or feature suggestions. • Visualize your Scikit model performance with W&B - https://app.wandb.ai/lavanyashukla/visualize-sklearn/reports/Visualizing-Sklearn-With-Weights-and-Biases--Vmlldzo0ODIzNg • Blog: https://www.wandb.com/articles • Gallery: See what you can create with W&B - https://app.wandb.ai/gallery

11 Mars 202034min

Populärt inom Business & ekonomi

badfluence
framgangspodden
varvet
rss-jossan-nina
rss-svart-marknad
uppgang-och-fall
lastbilspodden
rss-borsens-finest
avanzapodden
fill-or-kill
affarsvarlden
rss-inga-dumma-fragor-om-pengar
rss-en-rik-historia
rss-dagen-med-di
borsmorgon
rss-kort-lang-analyspodden-fran-di
rikatillsammans-om-privatekonomi-rikedom-i-livet
bathina-en-podcast
rss-den-nya-ekonomin
dynastin