#226 – Holden Karnofsky on unexploited opportunities to make AI safer — and all his AGI takes

#226 – Holden Karnofsky on unexploited opportunities to make AI safer — and all his AGI takes

For years, working on AI safety usually meant theorising about the ‘alignment problem’ or trying to convince other people to give a damn. If you could find any way to help, the work was frustrating and low feedback.

According to Anthropic’s Holden Karnofsky, this situation has now reversed completely.

There are now large amounts of useful, concrete, shovel-ready projects with clear goals and deliverables. Holden thinks people haven’t appreciated the scale of the shift, and wants everyone to see the large range of ‘well-scoped object-level work’ they could personally help with, in both technical and non-technical areas.

Video, full transcript, and links to learn more: https://80k.info/hk25

In today’s interview, Holden — previously cofounder and CEO of Open Philanthropy (now Coefficient Giving) — lists 39 projects he’s excited to see happening, including:

  • Training deceptive AI models to study deception and how to detect it
  • Developing classifiers to block jailbreaking
  • Implementing security measures to stop ‘backdoors’ or ‘secret loyalties’ from being added to models in training
  • Developing policies on model welfare, AI-human relationships, and what instructions to give models
  • Training AIs to work as alignment researchers

And that’s all just stuff he’s happened to observe directly, which is probably only a small fraction of the options available.

Holden makes a case that, for many people, working at an AI company like Anthropic will be the best way to steer AGI in a positive direction. He notes there are “ways that you can reduce AI risk that you can only do if you’re a competitive frontier AI company.” At the same time, he believes external groups have their own advantages and can be equally impactful.

Critics worry that Anthropic’s efforts to stay at that frontier encourage competitive racing towards AGI — significantly or entirely offsetting any useful research they do. Holden thinks this seriously misunderstands the strategic situation we’re in — and explains his case in detail with host Rob Wiblin.

Chapters:

  • Cold open (00:00:00)
  • Holden is back! (00:02:26)
  • An AI Chernobyl we never notice (00:02:56)
  • Is rogue AI takeover easy or hard? (00:07:32)
  • The AGI race isn't a coordination failure (00:17:48)
  • What Holden now does at Anthropic (00:28:04)
  • The case for working at Anthropic (00:30:08)
  • Is Anthropic doing enough? (00:40:45)
  • Can we trust Anthropic, or any AI company? (00:43:40)
  • How can Anthropic compete while paying the “safety tax”? (00:49:14)
  • What, if anything, could prompt Anthropic to halt development of AGI? (00:56:11)
  • Holden's retrospective on responsible scaling policies (00:59:01)
  • Overrated work (01:14:27)
  • Concrete shovel-ready projects Holden is excited about (01:16:37)
  • Great things to do in technical AI safety (01:20:48)
  • Great things to do on AI welfare and AI relationships (01:28:18)
  • Great things to do in biosecurity and pandemic preparedness (01:35:11)
  • How to choose where to work (01:35:57)
  • Overrated AI risk: Cyberattacks (01:41:56)
  • Overrated AI risk: Persuasion (01:51:37)
  • Why AI R&D is the main thing to worry about (01:55:36)
  • The case that AI-enabled R&D wouldn't speed things up much (02:07:15)
  • AI-enabled human power grabs (02:11:10)
  • Main benefits of getting AGI right (02:23:07)
  • The world is handling AGI about as badly as possible (02:29:07)
  • Learning from targeting companies for public criticism in farm animal welfare (02:31:39)
  • Will Anthropic actually make any difference? (02:40:51)
  • “Misaligned” vs “misaligned and power-seeking” (02:55:12)
  • Success without dignity: how we could win despite being stupid (03:00:58)
  • Holden sees less dignity but has more hope (03:08:30)
  • Should we expect misaligned power-seeking by default? (03:15:58)
  • Will reinforcement learning make everything worse? (03:23:45)
  • Should we push for marginal improvements or big paradigm shifts? (03:28:58)
  • Should safety-focused people cluster or spread out? (03:31:35)
  • Is Anthropic vocal enough about strong regulation? (03:35:56)
  • Is Holden biased because of his financial stake in Anthropic? (03:39:26)
  • Have we learned clever governance structures don't work? (03:43:51)
  • Is Holden scared of AI bioweapons? (03:46:12)
  • Holden thinks AI companions are bad news (03:49:47)
  • Are AI companies too hawkish on China? (03:56:39)
  • The frontier of infosec: confidentiality vs integrity (04:00:51)
  • How often does AI work backfire? (04:03:38)
  • Is AI clearly more impactful to work in? (04:18:26)
  • What's the role of earning to give? (04:24:54)

This episode was recorded on July 25 and 28, 2025.

Video editing: Simon Monsour, Luke Monsour, Dominic Armstrong, and Milo McGuire
Audio engineering: Milo McGuire, Simon Monsour, and Dominic Armstrong
Music: CORBIT
Coordination, transcriptions, and web: Katy Moore

Avsnitt(326)

#29 - Anders Sandberg on 3 new resolutions for the Fermi paradox & how to colonise the universe

#29 - Anders Sandberg on 3 new resolutions for the Fermi paradox & how to colonise the universe

Part 2 out now: #33 - Dr Anders Sandberg on what if we ended ageing, solar flares & the annual risk of nuclear war The universe is so vast, yet we don’t see any alien civilizations. If they exist, whe...

8 Maj 20181h 21min

#28 - Owen Cotton-Barratt on why scientists should need insurance, PhD strategy & fast AI progresses

#28 - Owen Cotton-Barratt on why scientists should need insurance, PhD strategy & fast AI progresses

A researcher is working on creating a new virus – one more dangerous than any that exist naturally. They believe they’re being as careful as possible. After all, if things go wrong, their own life and...

27 Apr 20181h 3min

#27 - Dr Tom Inglesby on careers and policies that reduce global catastrophic biological risks

#27 - Dr Tom Inglesby on careers and policies that reduce global catastrophic biological risks

How about this for a movie idea: a main character has to prevent a new contagious strain of Ebola spreading around the world. She’s the best of the best. So good in fact, that her work on early detect...

18 Apr 20182h 16min

#26 - Marie Gibbons on how exactly clean meat is made & what's needed to get it in every supermarket

#26 - Marie Gibbons on how exactly clean meat is made & what's needed to get it in every supermarket

First, decide on the type of animal. Next, pick the cell type. Then take a small, painless biopsy, and put the cells in a solution that makes them feel like they’re still in the body. Once the cells a...

10 Apr 20181h 44min

#25 - Robin Hanson on why we have to lie to ourselves about why we do what we do

#25 - Robin Hanson on why we have to lie to ourselves about why we do what we do

On February 2, 1685, England’s King Charles II was struck by a sudden illness. Fortunately his physicians were the best of the best. To reassure the public they kept them abreast of the King’s treatme...

28 Mars 20182h 39min

#24 - Stefan Schubert on why it’s a bad idea to break the rules, even if it’s for a good cause

#24 - Stefan Schubert on why it’s a bad idea to break the rules, even if it’s for a good cause

How honest should we be? How helpful? How friendly? If our society claims to value honesty, for instance, but in reality accepts an awful lot of lying – should we go along with those lax standards? Or...

20 Mars 201855min

#23 - How to actually become an AI alignment researcher, according to Dr Jan Leike

#23 - How to actually become an AI alignment researcher, according to Dr Jan Leike

Want to help steer the 21st century’s most transformative technology? First complete an undergrad degree in computer science and mathematics. Prioritize harder courses over easier ones. Publish at lea...

16 Mars 201845min

#22 - Leah Utyasheva on the non-profit that figured out how to massively cut suicide rates

#22 - Leah Utyasheva on the non-profit that figured out how to massively cut suicide rates

How people kill themselves varies enormously depending on which means are most easily available. In the United States, suicide by firearm stands out. In Hong Kong, where most people live in high rise ...

7 Mars 20181h 8min

Populärt inom Utbildning

rss-bara-en-till-om-missbruk-medberoende-2
historiepodden-se
det-skaver
alska-oss
harrisons-dramatiska-historia
nu-blir-det-historia
not-fanny-anymore
rss-foraldramotet-bring-lagercrantz
roda-vita-rosen
johannes-hansen-podcast
rss-viktmedicinpodden
sektledare
sa-in-i-sjalen
i-vantan-pa-katastrofen
rss-max-tant-med-max-villman
rss-sjalsligt-avkladd
rib-podcast
rikatillsammans-om-privatekonomi-rikedom-i-livet
allt-du-velat-veta
rss-relationsrevolutionen