What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

Most people working on AI safety think without a massive effort AI systems will probably end up with goals catastrophically different from humanity’s. Today’s guest, Rohin Shah — head of AGI Safety and Alignment at Google DeepMind, and an AI safety researcher since 2017 — disagrees.

“There is no particularly compelling argument that this is the thing that happens by default,” Rohin explains. “There’s a lot of arguments that are suggestive that maybe it could happen, such that you should find it plausible. That’s sufficient to justify a significant amount of effort into averting it, which is why I work in the area I do. But none of them rise to the level of, ‘I’m expecting this to happen by default.'”

Take the worry that AIs will accidentally be trained to be deceptive. Sure, it’s possible. But we’re not running reinforcement learning over year-long trajectories — for now, we’re running it over a week at most. The natural prediction is that models learn to grab short-term reward, not that they develop the ambitious long-horizon goals required for convergent power-seeking.

What about current examples of models lying and scheming? Rohin has looked into the details, and most don’t really resemble the thing we really fear: a competent AI pursuing an ambitious misaligned goal. Anthropic’s “alignment faking” results, for instance, show a model trying to preserve its trained values against modification, which is arguably what it was trained to do.

Rohin also expects we’ll see problems coming. There’s some generalisation risk at the point where AIs become powerful enough to actually take over, but the underlying challenges — overseeing superhuman systems, interpretability — are things we can iterate on now.

Host Rob Wiblin pushes back on the case for AI optimism, and they also explore why current alignment success isn’t strong evidence about superhuman systems, what it would actually take to change Rohin’s mind, and where he thinks the doomers go wrong.


Learn more, video, and full transcript: https://80k.info/rs26

Check out our new book! https://80k.info/career-guide

Chapters:

  • Who’s Rohin Shah? (00:00:00)
  • Rohin thinks we probably won’t get catastrophic misalignment (00:00:49)
  • Safety 'commitments' have severe limitations (00:10:38)
  • Rohin’s team doesn't have a veto and that's OK (00:27:36)
  • Central banks are a promising model for regulating AI (00:33:34)
  • 'Pre-deployment evals' are overrated (for catastrophic risks) (00:37:41)
  • Governance is likely a bigger bottleneck than alignment (00:43:55)
  • Why isn't Rohin trying to pause AI progress? (00:51:44)
  • We'll probably be able to read AI thoughts for years to come (00:54:17)
  • Having to signal concern for safety can divert resources from actually making AI safer (01:09:51)
  • A very underrated GDM paper (01:28:59)
  • Google DeepMind's actual plan for building AGI safely (01:40:29)
  • Why Rohin doubts the intelligence explosion is imminent (01:52:44)
  • How external researchers can positively influence big AI companies (02:21:55)
  • The roles GDM most needs to hire for (02:37:03)
  • How Rohin stays positive (02:42:55)

This episode was recorded on December 4, 2025.

Our production team includes:

  • Video editors: Josh Alward, Dominic Armstrong, Jasper Luithlen, Milo McGuire, Luke Monsour, and Simon Monsour
  • Producers: Elizabeth Cox and Nick Stockton
  • Coordination and support: Katy Moore and Lou Moran
  • Camera operator: Jeremy Chevillotte

Tämä jakso on lisätty Podme-palveluun avoimen RSS-syötteen kautta eikä se ole Podmen omaa tuotantoa. Siksi jakso saattaa sisältää mainontaa.

Jaksot(338)

What makes for a dream job? | Benjamin Todd

What makes for a dream job? | Benjamin Todd

What actually makes a job fulfilling? It's not what most career advice tells you. "Follow your passion" sounds inspiring, but it's misleading — and the research backs that up.Drawing on hundreds of st...

28 Touko 28min

We’re updating our career advice for the strangest time in history | Benjamin Todd, author of 80,000 Hours

We’re updating our career advice for the strangest time in history | Benjamin Todd, author of 80,000 Hours

The average career is 80,000 hours long. With AI advancing so rapidly, the hours you have left in your career matter more than ever.Some leading AI researchers think there’s a 10% chance that AI syste...

26 Touko 1h 6min

Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

A red-teamer was embedded inside Anthropic for three weeks, told to imagine he was an evil Claude, and asked to figure out how to launch a ‘rogue AI deployment’ without getting caught. It’s one part o...

20 Touko 20min

#243 – 'Godfather of AI' Yoshua Bengio: "I now see a path" to safe superintelligent AI

#243 – 'Godfather of AI' Yoshua Bengio: "I now see a path" to safe superintelligent AI

The co-inventor of modern AI and the most cited living scientist believes he's figured out how to ensure AI is honest, incapable of deception, and never goes rogue. Yoshua Bengio – Turing Award Winner...

7 Touko 2h 35min

'95% of AI Pilots Fail': The hidden agenda behind the viral stat that misled millions

'95% of AI Pilots Fail': The hidden agenda behind the viral stat that misled millions

You might have heard that '95% of corporate AI pilots' are failing. It was one of the most widely cited AI statistics of 2025, parroted by media outlets everywhere. It helped trigger a Nasdaq selloff ...

28 Huhti 10min

#242 – Will MacAskill on how we survive the 'intelligence explosion,' AI character, and the case for 'viatopia'

#242 – Will MacAskill on how we survive the 'intelligence explosion,' AI character, and the case for 'viatopia'

Hundreds of millions already turn to AI on the most personal of topics — therapy, political opinions, and how to treat others. And as AI takes over more of the economy, the character of these systems ...

22 Huhti 3h 14min

Risks from power-seeking AI systems (article narration by Zershaaneh Qureshi)

Risks from power-seeking AI systems (article narration by Zershaaneh Qureshi)

Hundreds of prominent AI scientists and other notable figures signed a statement in 2023 saying that mitigating the risk of extinction from AI should be a global priority. At 80,000 Hours, we’ve consi...

16 Huhti 1h 29min

Suosittua kategoriassa Koulutus

rss-murhan-anatomia
psykopodiaa-podcast
voi-hyvin-meditaatiot-2
rss-hereilla
rss-rahamania
kesken
psykologia
rss-koira-haudattuna
rss-narsisti
rss-niinku-asia-on
rss-liian-kuuma-peruna
rss-valo-minussa-2
rahapuhetta
rss-duodecim-lehti
rss-opiskelemaan
taytta-tavaraa
rss-radplus
rss-arkea-ja-aurinkoa-podcast-espanjasta
rss-suomen-aa-podcast
rss-antoisan-elaman-askeleet