We Taught AI to Play Games—Now It’s a $3.6 Million Company
AI and I16 Loka 2025

We Taught AI to Play Games—Now It’s a $3.6 Million Company

This episode is a little different from our usual fare: It’s a conversation with our head of AI training Alex Duffy about Good Start Labs, a company he incubated inside Every. Today, Good Start Labs is spinning out of Every as a separate company with $3.6 million in funding from General Catalyst, Inovia, Every, and a group of angel investors from top-tier AI labs like DeepMind. We get into how Alex learned some of his biggest lessons about the real world from games, starting with RuneScape, which taught him how markets work and how not to get scammed. He explains why the static benchmarks we use to evaluate LLMs today are breaking down, and how games like Diplomacy offer a richer, more dynamic way to test and train large language models. Finally, Alex shares where he sees the most promise in AI—software, life sciences, and education—and why he believes games can make the models we use smarter, while helping people understand and use AI more effectively.

If you found this episode interesting, please like, subscribe, comment, and share.


Want even more?

Sign up for Every to unlock our ultimate guide to prompting ChatGPT here: https://every.ck.page/ultimate-guide-to-prompting-chatgpt. It’s usually only for paying subscribers, but you can get it here for free.


To hear more from Dan Shipper:


Timestamps

00:00:00 - Start

00:01:48 - Introduction

00:04:14 - Why evals and benchmarks are broken

00:07:13 - The sneakiest LLMs in the market

00:13:00 - A competition that turns prompting into a sport

00:15:49 - Building a business around using games to make AI better

00:22:39 - Can language models learn how to be funny

00:25:31 - Why games are a great way to evaluate and train new models

00:26:58 - What child psychology tells us about games and AI

00:30:10 - Using games to unlock continual learning in AI

00:36:42 - Why Alex cares deeply about games

00:44:37 - Where Alex sees the most promise in AI

00:50:54 - Rethinking how young people start their careers in the age of AI


Links to resources mentioned in the episode:

Jaksot(105)

How to Build an Agent-native Product | Mike Krieger

How to Build an Agent-native Product | Mike Krieger

Mike Krieger built one of the most consequential consumer apps of the last two decades as cofounder of Instagram. He is now at the frontier of determining what makes a breakout AI-native product as co...

25 Maalis 48min

Kate Lee on Taste, Hiring, and Running Editorial at Every

Kate Lee on Taste, Hiring, and Running Editorial at Every

Kate Lee has spent her career working with words—first as a literary agent, then in roles at Medium, WeWork, and Stripe. As Every’s editor in chief, she’s been the quiet force behind the newsletter fo...

18 Maalis 56min

We Made a Document Editor Where Humans and AI Work Side by Side

We Made a Document Editor Where Humans and AI Work Side by Side

Every has unveiled a new product, built by CEO Dan Shipper. It's called Proof, a free, open-source, live collaborative document editor built for humans and AI agents to work in together. Proof started...

11 Maalis 44min

Meet the Slowest Startup Incubator in the World—Pumping Out Billion-dollar Companies

Meet the Slowest Startup Incubator in the World—Pumping Out Billion-dollar Companies

Silicon Valley loves billion-dollar moonshots and AI darlings. Sam Gerstenzang and Dan Friedman are doing something different—they're starting medical spas and funeral homes.On this episode of AI & I,...

4 Maalis 45min

Meet the Student With No Teachers, No Homework—Just AI

Meet the Student With No Teachers, No Homework—Just AI

Depending on whom you ask, AI is either the best or worst thing that can happen to the next generation. The arguments come from educators, venture capitalists, op-ed writers, and anxious parents—but r...

25 Helmi 53min

OpenAI's Codex: This Model Is So Fast It Changes How You Code

OpenAI's Codex: This Model Is So Fast It Changes How You Code

OpenAI’s hottest app isn’t ChatGPT—it’s Codex.In the last few weeks alone, the Codex team shipped a desktop app, GPT-5.3 Codex (a new flagship model), and Spark, the fastest coding model I’ve ever use...

18 Helmi 46min

Inside OpenAI’s Agentic Browser, Atlas

Inside OpenAI’s Agentic Browser, Atlas

The AI labs fighting for attention during the Super Bowl call to mind another iconic Super Bowl moment: Apple’s 1984 ad for the Macintosh, which promised that the personal computer would be a source o...

11 Helmi 55min

How We Built 'Claudie,' Our AI Project Manager (Full Walkthrough)

How We Built 'Claudie,' Our AI Project Manager (Full Walkthrough)

A few weeks ago, Natalia Quintero wouldn’t have called herself technical. But since the beginning of January, she has woken up at 6 a.m. to vibe code with Claude. The AI project manager she built save...

4 Helmi 47min