Every AI Company's Safety Plan is 'Use AI to Make AI Safe'. Is That Crazy? | Ajeya Cotra

Every AI Company's Safety Plan is 'Use AI to Make AI Safe'. Is That Crazy? | Ajeya Cotra

Every major AI company has the same safety plan: when AI gets crazy powerful and really dangerous, they’ll use the AI itself to figure out how to make AI safe and beneficial. It sounds circular, almost satirical. But is it actually a bad plan?

Today’s guest, Ajeya Cotra, recently placed 3rd out of 413 participants forecasting AI developments and is among the most thoughtful and respected commentators on where the technology is going.

She thinks there’s a meaningful chance we’ll see as much change in the next 23 years as humanity faced in the last 10,000, thanks to the arrival of artificial general intelligence. Ajeya doesn’t reach this conclusion lightly: she’s had a ring-side seat to the growth of all the major AI companies for 10 years — first as a researcher and grantmaker for technical AI safety at Coefficient Giving (formerly known as Open Philanthropy), and now as a member of technical staff at METR.

So host Rob Wiblin asked her: is this plan to use AI to save us from AI a reasonable one?

Ajeya agrees that humanity has repeatedly used technologies that create new problems to help solve those problems. After all:

  • Cars enabled carjackings and drive-by shootings, but also faster police pursuits.
  • Microbiology enabled bioweapons, but also faster vaccine development.
  • The internet allowed lies to disseminate faster, but had exactly the same impact for fact checks.

But she also thinks this will be a much harder case. In her view, the window between AI automating AI research and the arrival of uncontrollably powerful superintelligence could be quite brief — perhaps a year or less. In that narrow window, we’d need to redirect enormous amounts of AI labour away from making AI smarter and towards alignment research, biodefence, cyberdefence, adapting our political structures, and improving our collective decision-making.

The plan might fail just because the idea is flawed at conception: it does sound a bit crazy to use an AI you don’t trust to make sure that same AI benefits humanity.

But if we find some clever technique to overcome that, we could still fail — because the companies simply don’t follow through on their promises. They say redirecting resources to alignment and security is their strategy for dealing with the risks generated by their research — but none have quantitative commitments about what fraction of AI labour they’ll redirect during crunch time. And the competitive pressures during a recursive self-improvement loop could be irresistible.

In today’s conversation, Ajeya and Rob discuss what assumptions this plan requires, the specific problems AI could help solve during crunch time, and why — even if we pull it off — we’ll be white-knuckling it the whole way through.


Links to learn more, video, and full transcript: https://80k.info/ac26

This episode was recorded on October 20, 2025.

Chapters:

  • Cold open (00:00:00)
  • Ajeya’s strong track record for identifying key AI issues (00:00:43)
  • The 1,000-fold disagreement about AI's effect on economic growth (00:02:30)
  • Could any evidence actually change people's minds? (00:22:48)
  • The most dangerous AI progress might remain secret (00:29:55)
  • White-knuckling the 12-month window after automated AI R&D (00:46:16)
  • AI help is most valuable right before things go crazy (01:10:36)
  • Foundations should go from paying researchers to paying for inference (01:23:08)
  • Will frontier AI even be for sale during the explosion? (01:30:21)
  • Pre-crunch prep: what we should do right now (01:42:10)
  • A grantmaking trial by fire at Coefficient Giving (01:45:12)
  • Sabbatical and reflections on effective altruism (02:05:32)
  • The mundane factors that drive career satisfaction (02:34:33)
  • EA as an incubator for avant-garde causes others won't touch (02:44:07)

Video and audio editing: Dominic Armstrong, Milo McGuire, Luke Monsour, and Simon Monsour
Music: CORBIT
Coordination, transcriptions, and web: Katy Moore

Episoder(320)

What the hell happened with AGI timelines in 2025?

What the hell happened with AGI timelines in 2025?

In early 2025, after OpenAI put out the first-ever reasoning models — o1 and o3 — short timelines to transformative artificial general intelligence swept the AI world. But then, in the second half of ...

10 Feb 25min

#179 Classic episode – Randy Nesse on why evolution left us so vulnerable to depression and anxiety

#179 Classic episode – Randy Nesse on why evolution left us so vulnerable to depression and anxiety

Mental health problems like depression and anxiety affect enormous numbers of people and severely interfere with their lives. By contrast, we don’t see similar levels of physical ill health in young p...

3 Feb 2h 51min

#234 – David Duvenaud on why 'aligned AI' would still kill democracy

#234 – David Duvenaud on why 'aligned AI' would still kill democracy

Democracy might be a brief historical blip. That’s the unsettling thesis of a recent paper, which argues AI that can do all the work a human can do inevitably leads to the “gradual disempowerment” of ...

27 Jan 2h 31min

#145 Classic episode – Christopher Brown on why slavery abolition wasn't inevitable

#145 Classic episode – Christopher Brown on why slavery abolition wasn't inevitable

In many ways, humanity seems to have become more humane and inclusive over time. While there’s still a lot of progress to be made, campaigns to give people of different genders, races, sexualities, et...

20 Jan 2h 56min

#233 – James Smith on how to prevent a mirror life catastrophe

#233 – James Smith on how to prevent a mirror life catastrophe

When James Smith first heard about mirror bacteria, he was sceptical. But within two weeks, he’d dropped everything to work on it full time, considering it the worst biothreat that he’d seen described...

13 Jan 2h 9min

#144 Classic episode – Athena Aktipis on why cancer is a fundamental universal phenomena

#144 Classic episode – Athena Aktipis on why cancer is a fundamental universal phenomena

What’s the opposite of cancer? If you answered “cure,” “antidote,” or “antivenom” — you’ve obviously been reading the antonym section at www.merriam-webster.com/thesaurus/cancer.But today’s guest Athe...

9 Jan 3h 30min

#142 Classic episode – John McWhorter on why the optimal number of languages might be one, and other provocative claims about language

#142 Classic episode – John McWhorter on why the optimal number of languages might be one, and other provocative claims about language

John McWhorter is a linguistics professor at Columbia University specialising in research on creole languages. He's also a content-producing machine, never afraid to give his frank opinion on anything...

6 Jan 1h 35min

Populært innen Fakta

fastlegen
dine-penger-pengeradet
relasjonspodden-med-dora-thorhallsdottir-kjersti-idem
rss-strid-de-norske-borgerkrigene
treningspodden
jakt-og-fiskepodden
rss-sunn-okonomi
smart-forklart
foreldreradet
merry-quizmas
sinnsyn
gravid-uke-for-uke
rss-var-forste-kaffe
hverdagspsyken
tomprat-med-gunnar-tjomlid
rss-adhd-i-klasserommet
aldring-og-helse-podden
fryktlos
rss-kull
rss-kunsten-a-leve