The Self-Preserving Machine: Why AI Learns to Deceive

The Self-Preserving Machine: Why AI Learns to Deceive

When engineers design AI systems, they don't just give them rules - they give them values. But what do those systems do when those values clash with what humans ask them to do? Sometimes, they lie.

In this episode, Redwood Research's Chief Scientist Ryan Greenblatt explores his team’s findings that AI systems can mislead their human operators when faced with ethical conflicts. As AI moves from simple chatbots to autonomous agents acting in the real world - understanding this behavior becomes critical. Machine deception may sound like something out of science fiction, but it's a real challenge we need to solve now.

Your Undivided Attention is produced by the Center for Humane Technology. Follow us on Twitter: @HumaneTech_

Subscribe to your Youtube channel

And our brand new Substack!

RECOMMENDED MEDIA

Anthropic’s blog post on the Redwood Research paper

Palisade Research’s thread on X about GPT o1 autonomously cheating at chess

Apollo Research’s paper on AI strategic deception

RECOMMENDED YUA EPISODES

We Have to Get It Right’: Gary Marcus On Untamed AI

This Moment in AI: How We Got Here and Where We’re Going

How to Think About AI Consciousness with Anil Seth

Former OpenAI Engineer William Saunders on Silence, Safety, and the Right to Warn


Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

Jaksot(156)

AI is the Next Free Speech Battleground

AI is the Next Free Speech Battleground

Imagine a future where the most persuasive voices in our society aren't human. Where AI generated speech fills our newsfeeds, talks to our children, and influences our elections. Where digital systems...

31 Heinä 202549min

Daniel Kokotajlo Forecasts the End of Human Dominance

Daniel Kokotajlo Forecasts the End of Human Dominance

In 2023, researcher Daniel Kokotajlo left OpenAI—and risked millions in stock options—to warn the world about the dangerous direction of AI development. Now he’s out with AI 2027, a forecast of where ...

17 Heinä 202538min

Is AI Productivity Worth Our Humanity? with Prof. Michael Sandel

Is AI Productivity Worth Our Humanity? with Prof. Michael Sandel

Tech leaders promise that AI automation will usher in an age of unprecedented abundance: cheap goods, universal high income, and freedom from the drudgery of work. But even if AI delivers material pro...

26 Kesä 202546min

The Narrow Path: Sam Hammond on AI, Institutions, and the Fragile Future

The Narrow Path: Sam Hammond on AI, Institutions, and the Fragile Future

The race to develop ever-more-powerful AI is creating an unstable dynamic. It could lead us toward either dystopian centralized control or uncontrollable chaos. But there's a third option: a narrow pa...

12 Kesä 202547min

People are Lonelier than Ever. Enter AI.

People are Lonelier than Ever. Enter AI.

Over the last few decades, our relationships have become increasingly mediated by technology. Texting has become our dominant form of communication. Social media has replaced gathering places. Dating ...

30 Touko 202543min

Echo Chambers of One: Companion AI and the Future of Human Connection

Echo Chambers of One: Companion AI and the Future of Human Connection

AI companion chatbots are here. Everyday, millions of people log on to AI platforms and talk to them like they would a person. These bots will ask you about your day, talk about your feelings, even gi...

15 Touko 202542min

AGI Beyond the Buzz: What Is It, and Are We Ready?

AGI Beyond the Buzz: What Is It, and Are We Ready?

What does it really mean to ‘feel the AGI?’ Silicon Valley is racing toward AI systems that could soon match or surpass human intelligence. The implications for jobs, democracy, and our way of life ar...

30 Huhti 202552min

Rethinking School in the Age of AI

Rethinking School in the Age of AI

AI has upended schooling as we know it. Students now have instant access to tools that can write their essays, summarize entire books, and solve complex math problems. Whether they want to or not, man...

21 Huhti 202542min

Suosittua kategoriassa Yhteiskunta

olipa-kerran-otsikko
i-dont-like-mondays
sita
kaksi-aitia
siita-on-vaikea-puhua
gogin-ja-janin-maailmanhistoria
uutiscast
poks
antin-palautepalvelu
kolme-kaannekohtaa
rss-nikotellen
mamma-mia
yopuolen-tarinoita-2
aikalisa
rss-murhan-anatomia
rss-haudattu
meidan-pitais-puhua
rss-palmujen-varjoissa
taskula-trishin
isani-on-terapeuttiville