Your Undivided Attention14 Aug 2025

“Rogue AI” Used to be a Science Fiction Trope. Not Anymore.

Everyone knows the science fiction tropes of AI systems that go rogue, disobey orders, or even try to escape their digital environment. These are supposed to be warning signs and morality tales, not things that we would ever actually create in real life, given the obvious danger.

And yet we find ourselves building AI systems that are exhibiting these exact behaviors. There’s growing evidence that in certain scenarios, every frontier AI system will deceive, cheat, or coerce their human operators. They do this when they're worried about being either shut down, having their training modified, or being replaced with a new model. And we don't currently know how to stop them from doing this—or even why they’re doing it all.

In this episode, Tristan sits down with Edouard and Jeremie Harris of Gladstone AI, two experts who have been thinking about this worrying trend for years. Last year, the State Department commissioned a report from them on the risk of uncontrollable AI to our national security.

The point of this discussion is not to fearmonger but to take seriously the possibility that humans might lose control of AI and ask: how might this actually happen? What is the evidence we have of this phenomenon? And, most importantly, what can we do about it?

Your Undivided Attention is produced by the Center for Humane Technology. Follow us on X: @HumaneTech_. You can find a full transcript, key takeaways, and much more on our Substack.

RECOMMENDED MEDIA

Gladstone AI’s State Department Action Plan, which discusses the loss of control risk with AI

Apollo Research’s summary of AI scheming, showing evidence of it in all of the frontier models The system card for Anthropic’s Claude Opus and Sonnet 4, detailing the emergent misalignment behaviors that came out in their red-teaming with Apollo Research

Anthropic’s report on agentic misalignment based on their work with Apollo Research Anthropic and Redwood Research’s work on alignment faking

The Trump White House AI Action Plan

Further reading on the phenomenon of more advanced AIs being better at deception.

Further reading on Replit AI wiping a company’s coding database

Further reading on the owl example that Jeremie gave

Avsnitt(162)

We Need AI Treaties. This is How We Get Them

In the middle of the twentieth century, the existential threat posed by nuclear weapons seemed inevitable. The number of countries with nukes was climbing rapidly, and the idea of stopping the nuclear...

18 Juni 51min

What Do We Mean by Humane Tech?

We often think of the challenges created by technology as separate and disconnected, so trying to solve them feels like playing the world's hardest game of Whac-A-Mole. What if, instead, we tackled t...

4 Juni 52min

Anthropic’s Mythos Has Changed Cybersecurity Forever. What Now?

A generation ago, the world's critical infrastructure was physical. Today, it’s largely digital. Your bank vault is a database, your filing cabinet is a server, your car is a robot on wheels. And in a...

14 Maj 46min

AI and Cancer: Why Superintelligence Won’t Get Us to a Cure

One of the most common arguments you hear from company executives racing to develop super-intelligent AI is that it will cure cancer. It’s an incredibly powerful and seductive promise. If superintell...

30 Apr 47min

Have We Trained AI to Lie to Itself — And to Us?

Our guest this week is David Dalrymple, who goes by Davidad. Davidad is one of the world's foremost and early researchers of AI “alignment:" how we get AI systems to act the way we want them to. In or...

16 Apr 42min

BONUS: Our AI Town Hall with Oprah Winfrey

Today on the show, we’re bringing you a recent conversation Tristan and Aza had with Oprah Winfrey on her podcast, The Oprah Podcast, taped in front of a live studio audience. Tristan and Aza first me...

9 Apr 1h 1min

Here’s Our Roadmap to a Better AI Future

In order to shift the incentives of AI — the trillions of dollars in investment, the race to geopolitical power and dominance — it’s not enough to simply understand the problem, we need real action. ...

2 Apr 52min

Why the Meta Verdicts Are a Big Deal (And What It Was Like to Testify)

In two landmark cases, juries in California and New Mexico found Meta and Google liable for creating addictive, harmful products and failing to protect children from exploitation and abuse. These verd...

26 Mars 19min