#245 – Rohin Shah on what it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers')

Most people working on AI safety think without a massive effort AI systems will probably end up with goals catastrophically different from humanity’s. Today’s guest, Rohin Shah — head of AGI Safety and Alignment at Google DeepMind, and an AI safety researcher since 2017 — disagrees.

“There is no particularly compelling argument that this is the thing that happens by default,” Rohin explains. “There’s a lot of arguments that are suggestive that maybe it could happen, such that you should find it plausible. That’s sufficient to justify a significant amount of effort into averting it, which is why I work in the area I do. But none of them rise to the level of, ‘I’m expecting this to happen by default.'”

Take the worry that AIs will accidentally be trained to be deceptive. Sure, it’s possible. But we’re not running reinforcement learning over year-long trajectories — for now, we’re running it over a week at most. The natural prediction is that models learn to grab short-term reward, not that they develop the ambitious long-horizon goals required for convergent power-seeking.

What about current examples of models lying and scheming? Rohin has looked into the details, and most don’t really resemble the thing we really fear: a competent AI pursuing an ambitious misaligned goal. Anthropic’s “alignment faking” results, for instance, show a model trying to preserve its trained values against modification, which is arguably what it was trained to do.

Rohin also expects we’ll see problems coming. There’s some generalisation risk at the point where AIs become powerful enough to actually take over, but the underlying challenges — overseeing superhuman systems, interpretability — are things we can iterate on now.

Host Rob Wiblin pushes back on the case for AI optimism, and they also explore why current alignment success isn’t strong evidence about superhuman systems, what it would actually take to change Rohin’s mind, and where he thinks the doomers go wrong.

Learn more, video, and full transcript: https://80k.info/rs26

Check out our new book! https://80k.info/career-guide

Chapters:

Who’s Rohin Shah? (00:00:00)
Rohin thinks we probably won’t get catastrophic misalignment (00:00:49)
Safety 'commitments' have severe limitations (00:10:38)
Rohin’s team doesn't have a veto and that's OK (00:27:36)
Central banks are a promising model for regulating AI (00:33:34)
'Pre-deployment evals' are overrated (for catastrophic risks) (00:37:41)
Governance is likely a bigger bottleneck than alignment (00:43:55)
Why isn't Rohin trying to pause AI progress? (00:51:44)
We'll probably be able to read AI thoughts for years to come (00:54:17)
Having to signal concern for safety can divert resources from actually making AI safer (01:09:51)
A very underrated GDM paper (01:28:59)
Google DeepMind's actual plan for building AGI safely (01:40:29)
Why Rohin doubts the intelligence explosion is imminent (01:52:44)
How external researchers can positively influence big AI companies (02:21:55)
The roles GDM most needs to hire for (02:37:03)
How Rohin stays positive (02:42:55)

This episode was recorded on December 4, 2025.

Our production team includes:

Video editors: Josh Alward, Dominic Armstrong, Jasper Luithlen, Milo McGuire, Luke Monsour, and Simon Monsour
Producers: Elizabeth Cox and Nick Stockton
Coordination and support: Katy Moore and Lou Moran
Camera operator: Jeremy Chevillotte

Tämä jakso on lisätty Podme-palveluun avoimen RSS-syötteen kautta eikä se ole Podmen omaa tuotantoa. Siksi jakso saattaa sisältää mainontaa.

Jaksot(343)

Jasmine Sun on what the people building AI really believe

Many AI researchers believe mass job displacement is coming — and some even think there’s a chance their technology will kill everyone. But they’re building it anyway. Writer and journalist Jasmine Su...

21 Heinä 0s

#247 – Anton Leicht on how middle powers avoid losing everything in a post-AI world

In a post-AGI world, can a country without access to frontier AI even be considered sovereign anymore?Anton Leicht says once frontier AI becomes a core economic input, the countries that own it will p...

14 Heinä 0s

#246 – Sneha Revanur on how a small team of activists helped pass America's landmark AI safety laws

Six years ago, aged just 15, Sneha Revanur founded the AI advocacy nonprofit Encode AI — back when AI felt like a niche issue. Now the world’s caught up with her, and she’s ready to share everything s...

8 Heinä 52min

We can guess what intergalactic war would look like. And strangely, it matters.

Intergalactic war is probably billions of years away — yet physics can already tell us how it ends. And strangely that conclusion is relevant to decisions people have to make today.In this video, Rob ...

18 Kesä 15min

How AI could create the world’s biggest problems (article by Zershaaneh Qureshi)

Imagine you’re living 15,000 years ago. Your people are hunter-gatherers and you sleep under the stars. If someone told you humans would one day build cities with millions of people, fly through the a...

11 Kesä 1h 29min

What makes for a dream job? | Benjamin Todd

What actually makes a job fulfilling? It's not what most career advice tells you. "Follow your passion" sounds inspiring, but it's misleading — and the research backs that up.Drawing on hundreds of st...

28 Touko 28min

#244 – Benjamin Todd on how we’re updating our career advice for the strangest time in history

The average career is 80,000 hours long. With AI advancing so rapidly, the hours you have left in your career matter more than ever.Some leading AI researchers think there’s a 10% chance that AI syste...

26 Touko 1h 6min