The Self-Preserving Machine: Why AI Learns to Deceive

The Self-Preserving Machine: Why AI Learns to Deceive

When engineers design AI systems, they don't just give them rules - they give them values. But what do those systems do when those values clash with what humans ask them to do? Sometimes, they lie.

In this episode, Redwood Research's Chief Scientist Ryan Greenblatt explores his team’s findings that AI systems can mislead their human operators when faced with ethical conflicts. As AI moves from simple chatbots to autonomous agents acting in the real world - understanding this behavior becomes critical. Machine deception may sound like something out of science fiction, but it's a real challenge we need to solve now.

Your Undivided Attention is produced by the Center for Humane Technology. Follow us on Twitter: @HumaneTech_

Subscribe to your Youtube channel

And our brand new Substack!

RECOMMENDED MEDIA

Anthropic’s blog post on the Redwood Research paper

Palisade Research’s thread on X about GPT o1 autonomously cheating at chess

Apollo Research’s paper on AI strategic deception

RECOMMENDED YUA EPISODES

We Have to Get It Right’: Gary Marcus On Untamed AI

This Moment in AI: How We Got Here and Where We’re Going

How to Think About AI Consciousness with Anil Seth

Former OpenAI Engineer William Saunders on Silence, Safety, and the Right to Warn


Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

Jaksot(158)

From Russia with Likes (Part 1) — with Renée DiResta

From Russia with Likes (Part 1) — with Renée DiResta

Today’s online propaganda has evolved in unforeseeable and seemingly absurd ways; by laughing at or spreading a Kermit the Frog meme, you may be unwittingly advancing the Russian agenda. These campaig...

24 Heinä 201945min

Down the Rabbit Hole by Design — with Guillaume Chaslot

Down the Rabbit Hole by Design — with Guillaume Chaslot

When we press play on a YouTube video, we set in motion an algorithm that taps all available data to find the next video that keeps us glued to the screen. Because of its advertising-based business mo...

10 Heinä 201954min

With Great Power Comes... No Responsibility? — with Yaёl Eisenstat

With Great Power Comes... No Responsibility? — with Yaёl Eisenstat

Aza sits down with Yael Eisenstat, a former CIA officer and a former advisor at the White House. When Yael noticed that Americans were having a harder and harder time finding common ground, she shifte...

25 Kesä 201955min

Should've Stayed in Vegas — with Natasha Dow Schüll

Should've Stayed in Vegas — with Natasha Dow Schüll

In part two of our interview with cultural anthropologist Natasha Dow Schüll, author of Addiction by Design, we learn what gamblers are really after a lot of the time — it’s not money. And it’s the sa...

19 Kesä 201939min

What Happened in Vegas — with Natasha Dow Schüll

What Happened in Vegas — with Natasha Dow Schüll

Natasha Dow Schüll, author of Addiction by Design, has spent years studying how slot machines hold gamblers spellbound, in an endless loop of play. She never imagined the addictive designs which she h...

10 Kesä 201940min

Launching June 10: Your Undivided Attention

Launching June 10: Your Undivided Attention

Technology has shredded our attention. We can do better. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

16 Huhti 20193min

Suosittua kategoriassa Yhteiskunta

sita
olipa-kerran-otsikko
kaksi-aitia
siita-on-vaikea-puhua
i-dont-like-mondays
ihme-ja-kumma
gogin-ja-janin-maailmanhistoria
uutiscast
poks
antin-palautepalvelu
kolme-kaannekohtaa
mamma-mia
rss-murhan-anatomia
yopuolen-tarinoita-2
rss-nikotellen
aikalisa
meidan-pitais-puhua
loukussa
rss-palmujen-varjoissa
naakkavalta