#236 – Max Harms on why teaching AI right from wrong could get everyone killed

#236 – Max Harms on why teaching AI right from wrong could get everyone killed

Most people in AI are trying to give AIs ‘good’ values. Max Harms wants us to give them no values at all. According to Max, the only safe design is an AGI that defers entirely to its human operators, has no views about how the world ought to be, is willingly modifiable, and completely indifferent to being shut down — a strategy no AI company is working on at all.

In Max’s view any grander preferences about the world, even ones we agree with, will necessarily become distorted during a recursive self-improvement loop, and be the seeds that grow into a violent takeover attempt once that AI is powerful enough.

It’s a vision that springs from the worldview laid out in If Anyone Builds It, Everyone Dies, the recent book by Eliezer Yudkowsky and Nate Soares, two of Max’s colleagues at the Machine Intelligence Research Institute.

To Max, the book’s core thesis is common sense: if you build something vastly smarter than you, and its goals are misaligned with your own, then its actions will probably result in human extinction.

And Max thinks misalignment is the default outcome. Consider evolution: its “goal” for humans was to maximise reproduction and pass on our genes as much as possible. But as technology has advanced we’ve learned to access the reward signal it set up for us, pleasure — without any reproduction at all, by having sex while on birth control for instance.

We can understand intellectually that this is inconsistent with what evolution was trying to design and motivate us to do. We just don’t care.

Max thinks current ML training has the same structural problem: our development processes are seeding AI models with a similar mismatch between goals and behaviour. Across virtually every training run, models designed to align with various human goals are also being rewarded for persisting, acquiring resources, and not being shut down.

This leads to Max’s research agenda. The idea is to train AI to be “corrigible” and defer to human control as its sole objective — no harmlessness goals, no moral values, nothing else. In practice, models would get rewarded for behaviours like being willing to shut themselves down or surrender power.

According to Max, other approaches to corrigibility have tended to treat it as a constraint on other goals like “make the world good,” rather than a primary objective in its own right. But those goals gave AI reasons to resist shutdown and otherwise undermine corrigibility. If you strip out those competing objectives, alignment might follow naturally from AI that is broadly obedient to humans.

Max has laid out the theoretical framework for “Corrigibility as a Singular Target,” but notes that essentially no empirical work has followed — no benchmarks, no training runs, no papers testing the idea in practice. Max wants to change this — he’s calling for collaborators to get in touch at maxharms.com.


Links to learn more, video, and full transcript: https://80k.info/mh26

This episode was recorded on October 19, 2025.

Chapters:

  • Cold open (00:00:00)
  • Who’s Max Harms? (00:01:20)
  • If anyone builds it, will everyone die? The MIRI perspective on AGI risk (00:01:56)
  • Evolution failed to ‘align’ us, just as we'll fail to align AI (00:24:28)
  • We're training AIs to want to stay alive and value power for its own sake (00:42:56)
  • Objections: Is the 'squiggle/paperclip problem' really real? (00:52:24)
  • Can we get empirical evidence re: 'alignment by default'? (01:05:02)
  • Why do few AI researchers share Max's perspective? (01:10:17)
  • We're training AI to pursue goals relentlessly — and superintelligence will too (01:18:34)
  • The case for a radical slowdown (01:24:51)
  • Max's best hope: corrigibility as stepping stone to alignment (01:27:53)
  • Corrigibility is both uniquely valuable, and practical, to train (01:32:34)
  • What training could ever make models corrigible enough? (01:45:06)
  • Corrigibility is also terribly risky due to misuse risk (01:51:38)
  • A single researcher could make a corrigibility benchmark. Nobody has. (01:58:57)
  • Red Heart & why Max writes hard science fiction (02:12:20)
  • Should you homeschool? Depends how weird your kids are. (02:34:08)

Video and audio editing: Dominic Armstrong, Milo McGuire, Luke Monsour, and Simon Monsour
Music: CORBIT
Coordination, transcripts, and web: Katy Moore

Denne episoden er hentet fra en åpen RSS-feed og er ikke publisert av Podme. Den kan derfor inneholde annonser.

Episoder(340)

We can guess what intergalactic war would look like. And strangely, it matters.

We can guess what intergalactic war would look like. And strangely, it matters.

Intergalactic war is probably billions of years away — yet physics can already tell us how it ends. And strangely that conclusion is relevant to decisions people have to make today.In this video, Rob ...

18 Jun 15min

How AI could create the world’s biggest problems (article by Zershaaneh Qureshi)

How AI could create the world’s biggest problems (article by Zershaaneh Qureshi)

Imagine you’re living 15,000 years ago. Your people are hunter-gatherers and you sleep under the stars. If someone told you humans would one day build cities with millions of people, fly through the a...

11 Jun 1h 29min

What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

Most people working on AI safety think without a massive effort AI systems will probably end up with goals catastrophically different from humanity’s. Today’s guest, Rohin Shah — head of AGI Safety an...

2 Jun 2h 48min

What makes for a dream job? | Benjamin Todd

What makes for a dream job? | Benjamin Todd

What actually makes a job fulfilling? It's not what most career advice tells you. "Follow your passion" sounds inspiring, but it's misleading — and the research backs that up.Drawing on hundreds of st...

28 Mai 28min

We’re updating our career advice for the strangest time in history | Benjamin Todd, author of 80,000 Hours

We’re updating our career advice for the strangest time in history | Benjamin Todd, author of 80,000 Hours

The average career is 80,000 hours long. With AI advancing so rapidly, the hours you have left in your career matter more than ever.Some leading AI researchers think there’s a 10% chance that AI syste...

26 Mai 1h 6min

Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

A red-teamer was embedded inside Anthropic for three weeks, told to imagine he was an evil Claude, and asked to figure out how to launch a ‘rogue AI deployment’ without getting caught. It’s one part o...

20 Mai 20min

#243 – 'Godfather of AI' Yoshua Bengio: "I now see a path" to safe superintelligent AI

#243 – 'Godfather of AI' Yoshua Bengio: "I now see a path" to safe superintelligent AI

The co-inventor of modern AI and the most cited living scientist believes he's figured out how to ensure AI is honest, incapable of deception, and never goes rogue. Yoshua Bengio – Turing Award Winner...

7 Mai 2h 35min

'95% of AI Pilots Fail': The hidden agenda behind the viral stat that misled millions

'95% of AI Pilots Fail': The hidden agenda behind the viral stat that misled millions

You might have heard that '95% of corporate AI pilots' are failing. It was one of the most widely cited AI statistics of 2025, parroted by media outlets everywhere. It helped trigger a Nasdaq selloff ...

28 Apr 10min

Populært innen Fakta

fastlegen
dine-penger-pengeradet
relasjonspodden-med-dora-thorhallsdottir-kjersti-idem
foreldreradet
jakt-og-fiskepodden
treningspodden
mikkels-paskenotter
rss-strid-de-norske-borgerkrigene
rss-kunsten-a-leve
rss-kull
sinnsyn
hverdagspsyken
rss-bisarr-historie
gravid-uke-for-uke
tomprat-med-gunnar-tjomlid
rss-sarbar-med-lotte-erik
rss-var-forste-kaffe
rss-kunstig-intelligens-med-elisabeth-maren-og-morten
rss-impressions-2
dopet