#226 – Holden Karnofsky on unexploited opportunities to make AI safer — and all his AGI takes

#226 – Holden Karnofsky on unexploited opportunities to make AI safer — and all his AGI takes

For years, working on AI safety usually meant theorising about the ‘alignment problem’ or trying to convince other people to give a damn. If you could find any way to help, the work was frustrating and low feedback.

According to Anthropic’s Holden Karnofsky, this situation has now reversed completely.

There are now large amounts of useful, concrete, shovel-ready projects with clear goals and deliverables. Holden thinks people haven’t appreciated the scale of the shift, and wants everyone to see the large range of ‘well-scoped object-level work’ they could personally help with, in both technical and non-technical areas.

Video, full transcript, and links to learn more: https://80k.info/hk25

In today’s interview, Holden — previously cofounder and CEO of Open Philanthropy (now Coefficient Giving) — lists 39 projects he’s excited to see happening, including:

  • Training deceptive AI models to study deception and how to detect it
  • Developing classifiers to block jailbreaking
  • Implementing security measures to stop ‘backdoors’ or ‘secret loyalties’ from being added to models in training
  • Developing policies on model welfare, AI-human relationships, and what instructions to give models
  • Training AIs to work as alignment researchers

And that’s all just stuff he’s happened to observe directly, which is probably only a small fraction of the options available.

Holden makes a case that, for many people, working at an AI company like Anthropic will be the best way to steer AGI in a positive direction. He notes there are “ways that you can reduce AI risk that you can only do if you’re a competitive frontier AI company.” At the same time, he believes external groups have their own advantages and can be equally impactful.

Critics worry that Anthropic’s efforts to stay at that frontier encourage competitive racing towards AGI — significantly or entirely offsetting any useful research they do. Holden thinks this seriously misunderstands the strategic situation we’re in — and explains his case in detail with host Rob Wiblin.

Chapters:

  • Cold open (00:00:00)
  • Holden is back! (00:02:26)
  • An AI Chernobyl we never notice (00:02:56)
  • Is rogue AI takeover easy or hard? (00:07:32)
  • The AGI race isn't a coordination failure (00:17:48)
  • What Holden now does at Anthropic (00:28:04)
  • The case for working at Anthropic (00:30:08)
  • Is Anthropic doing enough? (00:40:45)
  • Can we trust Anthropic, or any AI company? (00:43:40)
  • How can Anthropic compete while paying the “safety tax”? (00:49:14)
  • What, if anything, could prompt Anthropic to halt development of AGI? (00:56:11)
  • Holden's retrospective on responsible scaling policies (00:59:01)
  • Overrated work (01:14:27)
  • Concrete shovel-ready projects Holden is excited about (01:16:37)
  • Great things to do in technical AI safety (01:20:48)
  • Great things to do on AI welfare and AI relationships (01:28:18)
  • Great things to do in biosecurity and pandemic preparedness (01:35:11)
  • How to choose where to work (01:35:57)
  • Overrated AI risk: Cyberattacks (01:41:56)
  • Overrated AI risk: Persuasion (01:51:37)
  • Why AI R&D is the main thing to worry about (01:55:36)
  • The case that AI-enabled R&D wouldn't speed things up much (02:07:15)
  • AI-enabled human power grabs (02:11:10)
  • Main benefits of getting AGI right (02:23:07)
  • The world is handling AGI about as badly as possible (02:29:07)
  • Learning from targeting companies for public criticism in farm animal welfare (02:31:39)
  • Will Anthropic actually make any difference? (02:40:51)
  • “Misaligned” vs “misaligned and power-seeking” (02:55:12)
  • Success without dignity: how we could win despite being stupid (03:00:58)
  • Holden sees less dignity but has more hope (03:08:30)
  • Should we expect misaligned power-seeking by default? (03:15:58)
  • Will reinforcement learning make everything worse? (03:23:45)
  • Should we push for marginal improvements or big paradigm shifts? (03:28:58)
  • Should safety-focused people cluster or spread out? (03:31:35)
  • Is Anthropic vocal enough about strong regulation? (03:35:56)
  • Is Holden biased because of his financial stake in Anthropic? (03:39:26)
  • Have we learned clever governance structures don't work? (03:43:51)
  • Is Holden scared of AI bioweapons? (03:46:12)
  • Holden thinks AI companions are bad news (03:49:47)
  • Are AI companies too hawkish on China? (03:56:39)
  • The frontier of infosec: confidentiality vs integrity (04:00:51)
  • How often does AI work backfire? (04:03:38)
  • Is AI clearly more impactful to work in? (04:18:26)
  • What's the role of earning to give? (04:24:54)

This episode was recorded on July 25 and 28, 2025.

Video editing: Simon Monsour, Luke Monsour, Dominic Armstrong, and Milo McGuire
Audio engineering: Milo McGuire, Simon Monsour, and Dominic Armstrong
Music: CORBIT
Coordination, transcriptions, and web: Katy Moore

Jaksot(320)

#135 – Samuel Charap on key lessons from five months of war in Ukraine

#135 – Samuel Charap on key lessons from five months of war in Ukraine

After a frenetic level of commentary during February and March, the war in Ukraine has faded into the background of our news coverage. But with the benefit of time we're in a much stronger position to...

8 Elo 202254min

#134 – Ian Morris on what big-picture history teaches us

#134 – Ian Morris on what big-picture history teaches us

Wind back 1,000 years and the moral landscape looks very different to today. Most farming societies thought slavery was natural and unobjectionable, premarital sex was an abomination, women should obe...

22 Heinä 20223h 41min

#133 – Max Tegmark on how a 'put-up-or-shut-up' resolution led him to work on AI and algorithmic news selection

#133 – Max Tegmark on how a 'put-up-or-shut-up' resolution led him to work on AI and algorithmic news selection

On January 1, 2015, physicist Max Tegmark gave up something most of us love to do: complain about things without ever trying to fix them. That “put up or shut up” New Year’s resolution led to the firs...

1 Heinä 20222h 57min

#132 – Nova DasSarma on why information security may be critical to the safe development of AI systems

#132 – Nova DasSarma on why information security may be critical to the safe development of AI systems

If a business has spent $100 million developing a product, it's a fair bet that they don't want it stolen in two seconds and uploaded to the web where anyone can use it for free. This problem exists...

14 Kesä 20222h 42min

#131 – Lewis Dartnell on getting humanity to bounce back faster in a post-apocalyptic world

#131 – Lewis Dartnell on getting humanity to bounce back faster in a post-apocalyptic world

“We’re leaving these 16 contestants on an island with nothing but what they can scavenge from an abandoned factory and apartment block. Over the next 365 days, they’ll try to rebuild as much of civili...

3 Kesä 20221h 5min

#130 – Will MacAskill on balancing frugality with ambition, whether you need longtermism, & mental health under pressure

#130 – Will MacAskill on balancing frugality with ambition, whether you need longtermism, & mental health under pressure

Imagine you lead a nonprofit that operates on a shoestring budget. Staff are paid minimum wage, lunch is bread and hummus, and you're all bunched up on a few tables in a basement office. But over a fe...

23 Touko 20222h 16min

#129 – James Tibenderana on the state of the art in malaria control and elimination

#129 – James Tibenderana on the state of the art in malaria control and elimination

The good news is deaths from malaria have been cut by a third since 2005. The bad news is it still causes 250 million cases and 600,000 deaths a year, mostly among young children in sub-Saharan Africa...

9 Touko 20223h 19min

#128 – Chris Blattman on the five reasons wars happen

#128 – Chris Blattman on the five reasons wars happen

In nature, animals roar and bare their teeth to intimidate adversaries — but one side usually backs down, and real fights are rare. The wisdom of evolution is that the risk of violence is just too gre...

28 Huhti 20222h 46min

Suosittua kategoriassa Koulutus

rss-murhan-anatomia
psykopodiaa-podcast
rss-narsisti
voi-hyvin-meditaatiot-2
rss-vapaudu-voimaasi
rss-liian-kuuma-peruna
aamukahvilla
psykologia
dear-ladies
leveli
adhd-podi
kesken
rss-duodecim-lehti
avara-mieli
rahapuhetta
aloita-meditaatio
ihminen-tavattavissa-tommy-hellsten-instituutti
rss-tietoinen-yhteys-podcast-2
filocast-filosofian-perusteet
rss-luonnollinen-synnytys-podcast