“Rogue AI” Used to be a Science Fiction Trope. Not Anymore.

“Rogue AI” Used to be a Science Fiction Trope. Not Anymore.

Everyone knows the science fiction tropes of AI systems that go rogue, disobey orders, or even try to escape their digital environment. These are supposed to be warning signs and morality tales, not things that we would ever actually create in real life, given the obvious danger.

And yet we find ourselves building AI systems that are exhibiting these exact behaviors. There’s growing evidence that in certain scenarios, every frontier AI system will deceive, cheat, or coerce their human operators. They do this when they're worried about being either shut down, having their training modified, or being replaced with a new model. And we don't currently know how to stop them from doing this—or even why they’re doing it all.

In this episode, Tristan sits down with Edouard and Jeremie Harris of Gladstone AI, two experts who have been thinking about this worrying trend for years.  Last year, the State Department commissioned a report from them on the risk of uncontrollable AI to our national security.

The point of this discussion is not to fearmonger but to take seriously the possibility that humans might lose control of AI and ask: how might this actually happen? What is the evidence we have of this phenomenon? And, most importantly, what can we do about it?

Your Undivided Attention is produced by the Center for Humane Technology. Follow us on X: @HumaneTech_. You can find a full transcript, key takeaways, and much more on our Substack.

RECOMMENDED MEDIA

Gladstone AI’s State Department Action Plan, which discusses the loss of control risk with AI

Apollo Research’s summary of AI scheming, showing evidence of it in all of the frontier modelsThe system card for Anthropic’s Claude Opus and Sonnet 4, detailing the emergent misalignment behaviors that came out in their red-teaming with Apollo Research

Anthropic’s report on agentic misalignment based on their work with Apollo Research Anthropic and Redwood Research’s work on alignment faking

The Trump White House AI Action Plan

Further reading on the phenomenon of more advanced AIs being better at deception.

Further reading on Replit AI wiping a company’s coding database

Further reading on the owl example that Jeremie gave

Further reading on AI induced psychosis

Dan Hendryck and Eric Schmidt’s “Superintelligence Strategy”

RECOMMENDED YUA EPISODES

Daniel Kokotajlo Forecasts the End of Human Dominance

Behind the DeepSeek Hype, AI is Learning to Reason

The Self-Preserving Machine: Why AI Learns to Deceive

This Moment in AI: How We Got Here and Where We’re Going

CORRECTIONS

Tristan referenced a Wired article on the phenomenon of AI psychosis. It was actually from the New York Times.

Tristan hypothesized a scenario where a power-seeking AI might ask a user for access to their computer. While there are some AI services that can gain access to your computer with permission, they are specifically designed to do that. There haven’t been any documented cases of an AI going rogue and asking for control permissions.


Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

Episoder(156)

Here’s Our Roadmap to a Better AI Future

Here’s Our Roadmap to a Better AI Future

In order to shift the incentives of AI — the trillions of dollars in investment, the race to geopolitical power and dominance — it’s not enough to simply understand the problem, we need real action.  ...

2 Apr 52min

Why the Meta Verdicts Are a Big Deal (And What It Was Like to Testify)

Why the Meta Verdicts Are a Big Deal (And What It Was Like to Testify)

In two landmark cases, juries in California and New Mexico found Meta and Google liable for creating addictive, harmful products and failing to protect children from exploitation and abuse. These verd...

26 Mar 19min

A Conversation with the Team Behind "The AI Doc"

A Conversation with the Team Behind "The AI Doc"

“The AI Doc: Or How I Became An Apocaloptimist” opens in theaters across the U.S. this Friday, March 27. In this episode, we sit down with the team behind this groundbreaking documentary — Oscar-winni...

23 Mar 47min

AI Is Breaking Education. Rebecca Winthrop Has the Blueprint to Fix It.

AI Is Breaking Education. Rebecca Winthrop Has the Blueprint to Fix It.

The promise of AI in education is incredible: picture infinitely patient tutors that can teach every student exactly the way they need to be taught. But the history of education technology tells us th...

5 Mar 46min

The Race to Build God: AI's Existential Gamble — Yoshua Bengio & Tristan Harris at Davos

The Race to Build God: AI's Existential Gamble — Yoshua Bengio & Tristan Harris at Davos

This week on Your Undivided Attention, Tristan Harris and Daniel Barcay offer a backstage recap of what it was like to be at the Davos World Economic Forum meeting this year as the world’s power broke...

19 Feb 37min

FEED DROP: Possible with Reid Hoffman and Aria Finger

FEED DROP: Possible with Reid Hoffman and Aria Finger

This week on Your Undivided Attention, we’re bringing you Aza Raskin’s conversation with Reid Hoffman and Aria Finger on their podcast “Possible”. Reid and Aria are both tech entrepreneurs: Reid is th...

5 Feb 1h 7min

Attachment Hacking and the Rise of AI Psychosis

Attachment Hacking and the Rise of AI Psychosis

Therapy and companionship has become the #1 use case for AI, with millions worldwide sharing their innermost thoughts with AI systems — often things they wouldn't tell loved ones or human therapists. ...

21 Jan 50min

What Would It Take to Actually Trust Each Other? The Game Theory Dilemma

What Would It Take to Actually Trust Each Other? The Game Theory Dilemma

So much of our world today can be summed up in the cold logic of “if I don’t, they will.” This is the foundation of game theory, which holds that cooperation and virtue are irrational; that all that m...

8 Jan 45min

Populært innen Samfunn

rss-spartsklubben
giver-og-gjengen-vg
aftenpodden
konspirasjonspodden
aftenpodden-usa
popradet
lydartikler-fra-aftenposten
rss-nesten-hele-uka-med-lepperod
rss-henlagt-andy-larsgaard
alt-fortalt
grenselos
wolfgang-wee-uncut
min-barneoppdragelse
fladseth
rss-dette-ma-aldri-skje-igjen
synnve-og-vanessa
rss-dannet-uten-piano
krisemoter
rss-frekvens-med-anine-olsen
198-land-med-einar-trnquist