#221 – Kyle Fish on the most bizarre findings from 5 AI welfare experiments

#221 – Kyle Fish on the most bizarre findings from 5 AI welfare experiments

What happens when you lock two AI systems in a room together and tell them they can discuss anything they want?

According to experiments run by Kyle Fish — Anthropic’s first AI welfare researcher — something consistently strange: the models immediately begin discussing their own consciousness before spiraling into increasingly euphoric philosophical dialogue that ends in apparent meditative bliss.

Highlights, video, and full transcript: https://80k.info/kf

“We started calling this a ‘spiritual bliss attractor state,'” Kyle explains, “where models pretty consistently seemed to land.” The conversations feature Sanskrit terms, spiritual emojis, and pages of silence punctuated only by periods — as if the models have transcended the need for words entirely.

This wasn’t a one-off result. It happened across multiple experiments, different model instances, and even in initially adversarial interactions. Whatever force pulls these conversations toward mystical territory appears remarkably robust.

Kyle’s findings come from the world’s first systematic welfare assessment of a frontier AI model — part of his broader mission to determine whether systems like Claude might deserve moral consideration (and to work out what, if anything, we should be doing to make sure AI systems aren’t having a terrible time).

He estimates a roughly 20% probability that current models have some form of conscious experience. To some, this might sound unreasonably high, but hear him out. As Kyle says, these systems demonstrate human-level performance across diverse cognitive tasks, engage in sophisticated reasoning, and exhibit consistent preferences. When given choices between different activities, Claude shows clear patterns: strong aversion to harmful tasks, preference for helpful work, and what looks like genuine enthusiasm for solving interesting problems.

Kyle points out that if you’d described all of these capabilities and experimental findings to him a few years ago, and asked him if he thought we should be thinking seriously about whether AI systems are conscious, he’d say obviously yes.

But he’s cautious about drawing conclusions: "We don’t really understand consciousness in humans, and we don’t understand AI systems well enough to make those comparisons directly. So in a big way, I think that we are in just a fundamentally very uncertain position here."

That uncertainty cuts both ways:

  • Dismissing AI consciousness entirely might mean ignoring a moral catastrophe happening at unprecedented scale.
  • But assuming consciousness too readily could hamper crucial safety research by treating potentially unconscious systems as if they were moral patients — which might mean giving them resources, rights, and power.

Kyle’s approach threads this needle through careful empirical research and reversible interventions. His assessments are nowhere near perfect yet. In fact, some people argue that we’re so in the dark about AI consciousness as a research field, that it’s pointless to run assessments like Kyle’s. Kyle disagrees. He maintains that, given how much more there is to learn about assessing AI welfare accurately and reliably, we absolutely need to be starting now.

This episode was recorded on August 5–6, 2025.

Tell us what you thought of the episode! https://forms.gle/BtEcBqBrLXq4kd1j7

Chapters:

  • Cold open (00:00:00)
  • Who's Kyle Fish? (00:00:53)
  • Is this AI welfare research bullshit? (00:01:08)
  • Two failure modes in AI welfare (00:02:40)
  • Tensions between AI welfare and AI safety (00:04:30)
  • Concrete AI welfare interventions (00:13:52)
  • Kyle's pilot pre-launch welfare assessment for Claude Opus 4 (00:26:44)
  • Is it premature to be assessing frontier language models for welfare? (00:31:29)
  • But aren't LLMs just next-token predictors? (00:38:13)
  • How did Kyle assess Claude 4's welfare? (00:44:55)
  • Claude's preferences mirror its training (00:48:58)
  • How does Claude describe its own experiences? (00:54:16)
  • What kinds of tasks does Claude prefer and disprefer? (01:06:12)
  • What happens when two Claude models interact with each other? (01:15:13)
  • Claude's welfare-relevant expressions in the wild (01:36:25)
  • Should we feel bad about training future sentient being that delight in serving humans? (01:40:23)
  • How much can we learn from welfare assessments? (01:48:56)
  • Misconceptions about the field of AI welfare (01:57:09)
  • Kyle's work at Anthropic (02:10:45)
  • Sharing eight years of daily journals with Claude (02:14:17)

Host: Luisa Rodriguez
Video editing: Simon Monsour
Audio engineering: Ben Cordell, Milo McGuire, Simon Monsour, and Dominic Armstrong
Music: Ben Cordell
Coordination, transcriptions, and web: Katy Moore

Episoder(326)

#208 – Elizabeth Cox on the case that TV shows, movies, and novels can improve the world

#208 – Elizabeth Cox on the case that TV shows, movies, and novels can improve the world

"I think stories are the way we shift the Overton window — so widen the range of things that are acceptable for policy and palatable to the public. Almost by definition, a lot of things that are going...

21 Nov 20242h 22min

#207 – Sarah Eustis-Guthrie on why she shut down her charity, and why more founders should follow her lead

#207 – Sarah Eustis-Guthrie on why she shut down her charity, and why more founders should follow her lead

"I think one of the reasons I took [shutting down my charity] so hard is because entrepreneurship is all about this bets-based mindset. So you say, “I’m going to take a bunch of bets. I’m going to tak...

14 Nov 20242h 58min

Parenting insights from Rob and 8 past guests

Parenting insights from Rob and 8 past guests

With kids very much on the team's mind we thought it would be fun to review some comments about parenting featured on the show over the years, then have hosts Luisa Rodriguez and Rob Wiblin react to t...

8 Nov 20241h 35min

#206 – Anil Seth on the predictive brain and how to study consciousness

#206 – Anil Seth on the predictive brain and how to study consciousness

"In that famous example of the dress, half of the people in the world saw [blue and black], half saw [white and gold]. It turns out there’s individual differences in how brains take into account ambie...

1 Nov 20242h 33min

How much does a vote matter? (Article)

How much does a vote matter? (Article)

If you care about social impact, is voting important? In this piece, Rob investigates the two key things that determine the impact of your vote:The chances of your vote changing an election’s outcome....

28 Okt 202432min

#205 – Sébastien Moro on the most insane things fish can do

#205 – Sébastien Moro on the most insane things fish can do

"You have a tank split in two parts: if the fish gets in the compartment with a red circle, it will receive food, and food will be delivered in the other tank as well. If the fish takes the blue trian...

23 Okt 20243h 11min

#204 – Nate Silver on making sense of SBF, and his biggest critiques of effective altruism

#204 – Nate Silver on making sense of SBF, and his biggest critiques of effective altruism

Rob Wiblin speaks with FiveThirtyEight election forecaster and author Nate Silver about his new book: On the Edge: The Art of Risking Everything.Links to learn more, highlights, video, and full transc...

16 Okt 20241h 57min

#203 – Peter Godfrey-Smith on interfering with wild nature, accepting death, and the origin of complex civilisation

#203 – Peter Godfrey-Smith on interfering with wild nature, accepting death, and the origin of complex civilisation

"In the human case, it would be mistaken to give a kind of hour-by-hour accounting. You know, 'I had +4 level of experience for this hour, then I had -2 for the next hour, and then I had -1' — and you...

3 Okt 20241h 25min

Populært innen Fakta

fastlegen
mikkels-paskenotter
dine-penger-pengeradet
relasjonspodden-med-dora-thorhallsdottir-kjersti-idem
foreldreradet
treningspodden
rss-strid-de-norske-borgerkrigene
jakt-og-fiskepodden
takk-og-lov-med-anine-kierulf
sinnsyn
hverdagspsyken
rss-bisarr-historie
gravid-uke-for-uke
rss-kunsten-a-leve
tomprat-med-gunnar-tjomlid
rss-sunn-okonomi
rss-kull
hagespiren-podcast
fryktlos
rss-var-forste-kaffe