The Self-Preserving Machine: Why AI Learns to Deceive

The Self-Preserving Machine: Why AI Learns to Deceive

When engineers design AI systems, they don't just give them rules - they give them values. But what do those systems do when those values clash with what humans ask them to do? Sometimes, they lie.

In this episode, Redwood Research's Chief Scientist Ryan Greenblatt explores his team’s findings that AI systems can mislead their human operators when faced with ethical conflicts. As AI moves from simple chatbots to autonomous agents acting in the real world - understanding this behavior becomes critical. Machine deception may sound like something out of science fiction, but it's a real challenge we need to solve now.

Your Undivided Attention is produced by the Center for Humane Technology. Follow us on Twitter: @HumaneTech_

Subscribe to your Youtube channel

And our brand new Substack!

RECOMMENDED MEDIA

Anthropic’s blog post on the Redwood Research paper

Palisade Research’s thread on X about GPT o1 autonomously cheating at chess

Apollo Research’s paper on AI strategic deception

RECOMMENDED YUA EPISODES

We Have to Get It Right’: Gary Marcus On Untamed AI

This Moment in AI: How We Got Here and Where We’re Going

How to Think About AI Consciousness with Anil Seth

Former OpenAI Engineer William Saunders on Silence, Safety, and the Right to Warn


Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

Episoder(157)

Jonathan Haidt On How to Solve the Teen Mental Health Crisis

Jonathan Haidt On How to Solve the Teen Mental Health Crisis

Suicides. Self harm. Depression and anxiety. The toll of a social media-addicted, phone-based childhood has never been more stark. It can be easy for teens, parents and schools to feel like they’re tr...

11 Apr 20241h 5min

Chips Are the Future of AI. They’re Also Incredibly Vulnerable. With Chris Miller

Chips Are the Future of AI. They’re Also Incredibly Vulnerable. With Chris Miller

Beneath the race to train and release more powerful AI models lies another race: a race by companies and nation-states to secure the hardware to make sure they win AI supremacy. Correction: The latest...

29 Mar 202445min

Future-proofing Democracy In the Age of AI with Audrey Tang

Future-proofing Democracy In the Age of AI with Audrey Tang

What does a functioning democracy look like in the age of artificial intelligence? Could AI even be used to help a democracy flourish? Just in time for election season, Taiwan’s Minister of Digital Af...

29 Feb 202434min

U.S. Senators Grilled Social Media CEOs. Will Anything Change?

U.S. Senators Grilled Social Media CEOs. Will Anything Change?

Was it political progress, or just political theater? The recent Senate hearing with social media CEOs led to astonishing moments — including Mark Zuckerberg’s public apology to families who lost chil...

13 Feb 202425min

Taylor Swift is Not Alone: The Deepfake Nightmare Sweeping the Internet

Taylor Swift is Not Alone: The Deepfake Nightmare Sweeping the Internet

Over the past year, a tsunami of apps that digitally strip the clothes off real people has hit the market. Now anyone can create fake non-consensual sexual images in just a few clicks. With cases prol...

1 Feb 202442min

Can Myth Teach Us Anything About the Race to Build Artificial General Intelligence? With Josh Schrei

Can Myth Teach Us Anything About the Race to Build Artificial General Intelligence? With Josh Schrei

We usually talk about tech in terms of economics or policy, but the casual language tech leaders often use to describe AI — summoning an inanimate force with the powers of code — sounds more... magica...

18 Jan 202435min

How Will AI Affect the 2024 Elections? with Renee DiResta and Carl Miller

How Will AI Affect the 2024 Elections? with Renee DiResta and Carl Miller

2024 will be the biggest election year in world history. Forty countries will hold national elections, with over two billion voters heading to the polls. In this episode of Your Undivided Attention, t...

21 Des 202347min

2023 Ask Us Anything

2023 Ask Us Anything

You asked, we answered. This has been a big year in the world of tech, with the rapid proliferation of artificial intelligence, acceleration of neurotechnology, and continued ethical missteps of socia...

30 Nov 202335min

Populært innen Samfunn

rss-spartsklubben
giver-og-gjengen-vg
aftenpodden
konspirasjonspodden
aftenpodden-usa
popradet
rss-henlagt-andy-larsgaard
rss-nesten-hele-uka-med-lepperod
wolfgang-wee-uncut
lydartikler-fra-aftenposten
grenselos
alt-fortalt
min-barneoppdragelse
rss-dette-ma-aldri-skje-igjen
synnve-og-vanessa
rss-dannet-uten-piano
fladseth
198-land-med-einar-trnquist
rss-lilli-isabelle
krisemoter