The Self-Preserving Machine: Why AI Learns to Deceive

The Self-Preserving Machine: Why AI Learns to Deceive

When engineers design AI systems, they don't just give them rules - they give them values. But what do those systems do when those values clash with what humans ask them to do? Sometimes, they lie.

In this episode, Redwood Research's Chief Scientist Ryan Greenblatt explores his team’s findings that AI systems can mislead their human operators when faced with ethical conflicts. As AI moves from simple chatbots to autonomous agents acting in the real world - understanding this behavior becomes critical. Machine deception may sound like something out of science fiction, but it's a real challenge we need to solve now.

Your Undivided Attention is produced by the Center for Humane Technology. Follow us on Twitter: @HumaneTech_

Subscribe to your Youtube channel

And our brand new Substack!

RECOMMENDED MEDIA

Anthropic’s blog post on the Redwood Research paper

Palisade Research’s thread on X about GPT o1 autonomously cheating at chess

Apollo Research’s paper on AI strategic deception

RECOMMENDED YUA EPISODES

We Have to Get It Right’: Gary Marcus On Untamed AI

This Moment in AI: How We Got Here and Where We’re Going

How to Think About AI Consciousness with Anil Seth

Former OpenAI Engineer William Saunders on Silence, Safety, and the Right to Warn


Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

Tämä jakso on lisätty Podme-palveluun avoimen RSS-syötteen kautta eikä se ole Podmen omaa tuotantoa. Siksi jakso saattaa sisältää mainontaa.

Jaksot(162)

We Need AI Treaties. This is How We Get Them

We Need AI Treaties. This is How We Get Them

In the middle of the twentieth century, the existential threat posed by nuclear weapons seemed inevitable. The number of countries with nukes was climbing rapidly, and the idea of stopping the nuclear...

18 Kesä 51min

What Do We Mean by Humane Tech?

What Do We Mean by Humane Tech?

We often think of the challenges created by technology as separate and disconnected, so trying to solve them feels like playing the world's hardest game of Whac-A-Mole.  What if, instead, we tackled t...

4 Kesä 52min

Anthropic’s Mythos Has Changed Cybersecurity Forever. What Now?

Anthropic’s Mythos Has Changed Cybersecurity Forever. What Now?

A generation ago, the world's critical infrastructure was physical. Today, it’s largely digital. Your bank vault is a database, your filing cabinet is a server, your car is a robot on wheels. And in a...

14 Touko 46min

AI and Cancer: Why Superintelligence Won’t Get Us to a Cure

AI and Cancer: Why Superintelligence Won’t Get Us to a Cure

One of the most common arguments you hear from company executives racing to develop super-intelligent AI is that it will cure cancer. It’s an incredibly powerful and seductive promise.  If superintell...

30 Huhti 47min

Have We Trained AI to Lie to Itself — And to Us?

Have We Trained AI to Lie to Itself — And to Us?

Our guest this week is David Dalrymple, who goes by Davidad. Davidad is one of the world's foremost and early researchers of AI “alignment:" how we get AI systems to act the way we want them to. In or...

16 Huhti 42min

BONUS: Our AI Town Hall with Oprah Winfrey

BONUS: Our AI Town Hall with Oprah Winfrey

Today on the show, we’re bringing you a recent conversation Tristan and Aza had with Oprah Winfrey on her podcast, The Oprah Podcast, taped in front of a live studio audience. Tristan and Aza first me...

9 Huhti 1h 1min

Here’s Our Roadmap to a Better AI Future

Here’s Our Roadmap to a Better AI Future

In order to shift the incentives of AI — the trillions of dollars in investment, the race to geopolitical power and dominance — it’s not enough to simply understand the problem, we need real action.  ...

2 Huhti 52min

Why the Meta Verdicts Are a Big Deal (And What It Was Like to Testify)

Why the Meta Verdicts Are a Big Deal (And What It Was Like to Testify)

In two landmark cases, juries in California and New Mexico found Meta and Google liable for creating addictive, harmful products and failing to protect children from exploitation and abuse. These verd...

26 Maalis 19min

Suosittua kategoriassa Yhteiskunta

olipa-kerran-otsikko
seitseman
siita-on-vaikea-puhua
hupiklubi
i-dont-like-mondays
sita
antin-palautepalvelu
poks
ihme-ja-kumma
uutiscast
kaksi-aitia
mamma-mia
gogin-ja-janin-maailmanhistoria
yopuolen-tarinoita-2
kolme-kaannekohtaa
rss-palmujen-varjoissa
rss-haudattu
aikalisa
kummitusjuttuja
meidan-pitais-puhua