80,000 Hours Podcast28 Aug 2025

#221 – Kyle Fish on the most bizarre findings from 5 AI welfare experiments

What happens when you lock two AI systems in a room together and tell them they can discuss anything they want?

According to experiments run by Kyle Fish — Anthropic’s first AI welfare researcher — something consistently strange: the models immediately begin discussing their own consciousness before spiraling into increasingly euphoric philosophical dialogue that ends in apparent meditative bliss.

Highlights, video, and full transcript: https://80k.info/kf

“We started calling this a ‘spiritual bliss attractor state,'” Kyle explains, “where models pretty consistently seemed to land.” The conversations feature Sanskrit terms, spiritual emojis, and pages of silence punctuated only by periods — as if the models have transcended the need for words entirely.

This wasn’t a one-off result. It happened across multiple experiments, different model instances, and even in initially adversarial interactions. Whatever force pulls these conversations toward mystical territory appears remarkably robust.

Kyle’s findings come from the world’s first systematic welfare assessment of a frontier AI model — part of his broader mission to determine whether systems like Claude might deserve moral consideration (and to work out what, if anything, we should be doing to make sure AI systems aren’t having a terrible time).

He estimates a roughly 20% probability that current models have some form of conscious experience. To some, this might sound unreasonably high, but hear him out. As Kyle says, these systems demonstrate human-level performance across diverse cognitive tasks, engage in sophisticated reasoning, and exhibit consistent preferences. When given choices between different activities, Claude shows clear patterns: strong aversion to harmful tasks, preference for helpful work, and what looks like genuine enthusiasm for solving interesting problems.

Kyle points out that if you’d described all of these capabilities and experimental findings to him a few years ago, and asked him if he thought we should be thinking seriously about whether AI systems are conscious, he’d say obviously yes.

But he’s cautious about drawing conclusions: "We don’t really understand consciousness in humans, and we don’t understand AI systems well enough to make those comparisons directly. So in a big way, I think that we are in just a fundamentally very uncertain position here."

That uncertainty cuts both ways:

Dismissing AI consciousness entirely might mean ignoring a moral catastrophe happening at unprecedented scale.
But assuming consciousness too readily could hamper crucial safety research by treating potentially unconscious systems as if they were moral patients — which might mean giving them resources, rights, and power.

Kyle’s approach threads this needle through careful empirical research and reversible interventions. His assessments are nowhere near perfect yet. In fact, some people argue that we’re so in the dark about AI consciousness as a research field, that it’s pointless to run assessments like Kyle’s. Kyle disagrees. He maintains that, given how much more there is to learn about assessing AI welfare accurately and reliably, we absolutely need to be starting now.

This episode was recorded on August 5–6, 2025.

Tell us what you thought of the episode! https://forms.gle/BtEcBqBrLXq4kd1j7

Chapters:

Cold open (00:00:00)
Who's Kyle Fish? (00:00:53)
Is this AI welfare research bullshit? (00:01:08)
Two failure modes in AI welfare (00:02:40)
Tensions between AI welfare and AI safety (00:04:30)
Concrete AI welfare interventions (00:13:52)
Kyle's pilot pre-launch welfare assessment for Claude Opus 4 (00:26:44)
Is it premature to be assessing frontier language models for welfare? (00:31:29)
But aren't LLMs just next-token predictors? (00:38:13)
How did Kyle assess Claude 4's welfare? (00:44:55)
Claude's preferences mirror its training (00:48:58)
How does Claude describe its own experiences? (00:54:16)
What kinds of tasks does Claude prefer and disprefer? (01:06:12)
What happens when two Claude models interact with each other? (01:15:13)
Claude's welfare-relevant expressions in the wild (01:36:25)
Should we feel bad about training future sentient being that delight in serving humans? (01:40:23)
How much can we learn from welfare assessments? (01:48:56)
Misconceptions about the field of AI welfare (01:57:09)
Kyle's work at Anthropic (02:10:45)
Sharing eight years of daily journals with Claude (02:14:17)

Host: Luisa Rodriguez
Video editing: Simon Monsour
Audio engineering: Ben Cordell, Milo McGuire, Simon Monsour, and Dominic Armstrong
Music: Ben Cordell
Coordination, transcriptions, and web: Katy Moore

Upptäck Premium

Prova 14 dagar kostnadsfritt

Skaffa Premium

Avsnitt(333)

#12 - Beth Cameron works to stop you dying in a pandemic. Here’s what keeps her up at night.

“When you're in the middle of a crisis and you have to ask for money, you're already too late.” That’s Dr Beth Cameron, who leads Global Biological Policy and Programs at the Nuclear Threat Initiative...

25 Okt 20171h 45min

#11 - Spencer Greenberg on speeding up social science 10-fold & why plenty of startups cause harm

Do most meat eaters think it’s wrong to hurt animals? Do Americans think climate change is likely to cause human extinction? What is the best, state-of-the-art therapy for depression? How can we make ...

17 Okt 20171h 29min

#10 - Nick Beckstead on how to spend billions of dollars preventing human extinction

What if you were in a position to give away billions of dollars to improve the world? What would you do with it? This is the problem facing Program Officers at the Open Philanthropy Project - people l...

11 Okt 20171h 51min

#9 - Christine Peterson on how insecure computers could lead to global disaster, and how to fix it

Take a trip to Silicon Valley in the 70s and 80s, when going to space sounded like a good way to get around environmental limits, people started cryogenically freezing themselves, and nanotechnology l...

4 Okt 20171h 45min

#8 - Lewis Bollard on how to end factory farming in our lifetimes

Every year tens of billions of animals are raised in terrible conditions in factory farms before being killed for human consumption. Over the last two years Lewis Bollard – Project Officer for Farm An...

27 Sep 20173h 16min

#7 - Julia Galef on making humanity more rational, what EA does wrong, and why Twitter isn’t all bad

The scientific revolution in the 16th century was one of the biggest societal shifts in human history, driven by the discovery of new and better methods of figuring out who was right and who was wrong...

13 Sep 20171h 14min

#6 - Toby Ord on why the long-term future matters more than anything else & what to do about it

Of all the people whose well-being we should care about, only a small fraction are alive today. The rest are members of future generations who are yet to exist. Whether they’ll be born into a world th...

6 Sep 20172h 8min

#5 - Alex Gordon-Brown on how to donate millions in your 20s working in quantitative trading

Quantitative financial trading is one of the highest paying parts of the world’s highest paying industry. 25 to 30 year olds with outstanding maths skills can earn millions a year in an obscure set of...

28 Aug 20171h 45min

Allt en och samma app

Lyssna på dina favoritpoddar och ljudböcker på ett och samma ställe.

Noga utvalt innehåll

Njut av handplockade tips som passar din smak – utan ändlöst scrollande.

Fortsätt när du vill

Fortsätt lyssna där du slutade – även offline.

Premium

99 kr/ månad

Tillgång till alla Premium-poddar
Reklamfritt premium-innehåll
Avsluta när du vill

Prova 14 dagar gratis

Premium

129 kr/ månad

Tillgång till alla Premium-poddar
Reklamfritt premium-innehåll
Avsluta när du vill
Ett extra konto

Prova 14 dagar gratis

Populärt inom Utbildning

rss-bara-en-till-om-missbruk-medberoende-2

rikatillsammans-om-privatekonomi-rikedom-i-livet

Berättelserna och rösterna du älskar att lyssna på

Obegränsad lyssning på alla dina favoritpoddar och ljudböcker

Upptäck Premium