80,000 Hours Podcast30 Loka 2025

#226 – Holden Karnofsky on unexploited opportunities to make AI safer — and all his AGI takes

For years, working on AI safety usually meant theorising about the ‘alignment problem’ or trying to convince other people to give a damn. If you could find any way to help, the work was frustrating and low feedback.

According to Anthropic’s Holden Karnofsky, this situation has now reversed completely.

There are now large amounts of useful, concrete, shovel-ready projects with clear goals and deliverables. Holden thinks people haven’t appreciated the scale of the shift, and wants everyone to see the large range of ‘well-scoped object-level work’ they could personally help with, in both technical and non-technical areas.

Video, full transcript, and links to learn more: https://80k.info/hk25

In today’s interview, Holden — previously cofounder and CEO of Open Philanthropy (now Coefficient Giving) — lists 39 projects he’s excited to see happening, including:

Training deceptive AI models to study deception and how to detect it
Developing classifiers to block jailbreaking
Implementing security measures to stop ‘backdoors’ or ‘secret loyalties’ from being added to models in training
Developing policies on model welfare, AI-human relationships, and what instructions to give models
Training AIs to work as alignment researchers

And that’s all just stuff he’s happened to observe directly, which is probably only a small fraction of the options available.

Holden makes a case that, for many people, working at an AI company like Anthropic will be the best way to steer AGI in a positive direction. He notes there are “ways that you can reduce AI risk that you can only do if you’re a competitive frontier AI company.” At the same time, he believes external groups have their own advantages and can be equally impactful.

Critics worry that Anthropic’s efforts to stay at that frontier encourage competitive racing towards AGI — significantly or entirely offsetting any useful research they do. Holden thinks this seriously misunderstands the strategic situation we’re in — and explains his case in detail with host Rob Wiblin.

Chapters:

Cold open (00:00:00)
Holden is back! (00:02:26)
An AI Chernobyl we never notice (00:02:56)
Is rogue AI takeover easy or hard? (00:07:32)
The AGI race isn't a coordination failure (00:17:48)
What Holden now does at Anthropic (00:28:04)
The case for working at Anthropic (00:30:08)
Is Anthropic doing enough? (00:40:45)
Can we trust Anthropic, or any AI company? (00:43:40)
How can Anthropic compete while paying the “safety tax”? (00:49:14)
What, if anything, could prompt Anthropic to halt development of AGI? (00:56:11)
Holden's retrospective on responsible scaling policies (00:59:01)
Overrated work (01:14:27)
Concrete shovel-ready projects Holden is excited about (01:16:37)
Great things to do in technical AI safety (01:20:48)
Great things to do on AI welfare and AI relationships (01:28:18)
Great things to do in biosecurity and pandemic preparedness (01:35:11)
How to choose where to work (01:35:57)
Overrated AI risk: Cyberattacks (01:41:56)
Overrated AI risk: Persuasion (01:51:37)
Why AI R&D is the main thing to worry about (01:55:36)
The case that AI-enabled R&D wouldn't speed things up much (02:07:15)
AI-enabled human power grabs (02:11:10)
Main benefits of getting AGI right (02:23:07)
The world is handling AGI about as badly as possible (02:29:07)
Learning from targeting companies for public criticism in farm animal welfare (02:31:39)
Will Anthropic actually make any difference? (02:40:51)
“Misaligned” vs “misaligned and power-seeking” (02:55:12)
Success without dignity: how we could win despite being stupid (03:00:58)
Holden sees less dignity but has more hope (03:08:30)
Should we expect misaligned power-seeking by default? (03:15:58)
Will reinforcement learning make everything worse? (03:23:45)
Should we push for marginal improvements or big paradigm shifts? (03:28:58)
Should safety-focused people cluster or spread out? (03:31:35)
Is Anthropic vocal enough about strong regulation? (03:35:56)
Is Holden biased because of his financial stake in Anthropic? (03:39:26)
Have we learned clever governance structures don't work? (03:43:51)
Is Holden scared of AI bioweapons? (03:46:12)
Holden thinks AI companions are bad news (03:49:47)
Are AI companies too hawkish on China? (03:56:39)
The frontier of infosec: confidentiality vs integrity (04:00:51)
How often does AI work backfire? (04:03:38)
Is AI clearly more impactful to work in? (04:18:26)
What's the role of earning to give? (04:24:54)

This episode was recorded on July 25 and 28, 2025.

Video editing: Simon Monsour, Luke Monsour, Dominic Armstrong, and Milo McGuire
Audio engineering: Milo McGuire, Simon Monsour, and Dominic Armstrong
Music: CORBIT
Coordination, transcriptions, and web: Katy Moore

Kokeile Premiumia

Nauti 14 päivää ilmaiseksi

Tilaa Premium

Jaksot(320)

#183 – Spencer Greenberg on causation without correlation, money and happiness, lightgassing, hype vs value, and more

"When a friend comes to me with a decision, and they want my thoughts on it, very rarely am I trying to give them a really specific answer, like, 'I solved your problem.' What I’m trying to do often i...

14 Maalis 20242h 36min

#182 – Bob Fischer on comparing the welfare of humans, chickens, pigs, octopuses, bees, and more

"[One] thing is just to spend time thinking about the kinds of things animals can do and what their lives are like. Just how hard a chicken will work to get to a nest box before she lays an egg, the a...

8 Maalis 20242h 21min

#181 – Laura Deming on the science that could keep us healthy in our 80s and beyond

"The question I care about is: What do I want to do? Like, when I'm 80, how strong do I want to be? OK, and then if I want to be that strong, how well do my muscles have to work? OK, and then if that'...

1 Maalis 20241h 37min

#180 – Hugo Mercier on why gullibility and misinformation are overrated

The World Economic Forum’s global risks survey of 1,400 experts, policymakers, and industry leaders ranked misinformation and disinformation as the number one global risk over the next two years — ran...

21 Helmi 20242h 36min

#179 – Randy Nesse on why evolution left us so vulnerable to depression and anxiety

Mental health problems like depression and anxiety affect enormous numbers of people and severely interfere with their lives. By contrast, we don’t see similar levels of physical ill health in young p...

12 Helmi 20242h 56min

#178 – Emily Oster on what the evidence actually says about pregnancy and parenting

"I think at various times — before you have the kid, after you have the kid — it's useful to sit down and think about: What do I want the shape of this to look like? What time do I want to be spending...

1 Helmi 20242h 22min

#177 – Nathan Labenz on recent AI breakthroughs and navigating the growing rift between AI safety and accelerationist camps

Back in December we spoke with Nathan Labenz — AI entrepreneur and host of The Cognitive Revolution Podcast — about the speed of progress towards AGI and OpenAI's leadership drama, drawing on Nathan's...

24 Tammi 20242h 47min

#90 Classic episode – Ajeya Cotra on worldview diversification and how big the future could be

You wake up in a mysterious box, and hear the booming voice of God: “I just flipped a coin. If it came up heads, I made ten boxes, labeled 1 through 10 — each of which has a human in it. If it came up...

12 Tammi 20242h 59min

Kaikki yhdessä sovelluksessa

Kuuntele kaikki suosikkipodcastisi ja -äänikirjasi yhdessä paikassa.

Sinulle valikoitua sisältöä

Podme-sovelluksessa kokoat suosikkisi helposti omaan kirjastoosi. Saat meiltä myös kuuntelusuosituksia!

Jatka kuuntelua koska tahansa

Voit jatkaa siitä mihin jäit, myös offline-tilassa.

Premium

9,99 €/kk

Kaikki premium-podcastit
Ei mainoksia
Ei sitoutumista, peruuta koska tahansa

Aloita 14 päivän kokeilu

Premium

13,99 €/kk

Kaikki premium-podcastit
Ei mainoksia
Ei sitoutumista, peruuta koska tahansa
Yksi lisäkäyttäjä

Kokeile 14 päivää maksutta

Suosittua kategoriassa Koulutus

ihminen-tavattavissa-tommy-hellsten-instituutti

Tarinat ja äänet, joita rakastat kuunnella

Kuuntele kaikki suosikkipodcastisi ja -äänikirjasi

Lue lisää