#80 – Stuart Russell on why our approach to AI is broken and how to fix it

#80 – Stuart Russell on why our approach to AI is broken and how to fix it

Stuart Russell, Professor at UC Berkeley and co-author of the most popular AI textbook, thinks the way we approach machine learning today is fundamentally flawed.

In his new book, Human Compatible, he outlines the 'standard model' of AI development, in which intelligence is measured as the ability to achieve some definite, completely-known objective that we've stated explicitly. This is so obvious it almost doesn't even seem like a design choice, but it is.

Unfortunately there's a big problem with this approach: it's incredibly hard to say exactly what you want. AI today lacks common sense, and simply does whatever we've asked it to. That's true even if the goal isn't what we really want, or the methods it's choosing are ones we would never accept.

We already see AIs misbehaving for this reason. Stuart points to the example of YouTube's recommender algorithm, which reportedly nudged users towards extreme political views because that made it easier to keep them on the site. This isn't something we wanted, but it helped achieve the algorithm's objective: maximise viewing time.

Like King Midas, who asked to be able to turn everything into gold but ended up unable to eat, we get too much of what we've asked for.

Links to learn more, summary and full transcript.

This 'alignment' problem will get more and more severe as machine learning is embedded in more and more places: recommending us news, operating power grids, deciding prison sentences, doing surgery, and fighting wars. If we're ever to hand over much of the economy to thinking machines, we can't count on ourselves correctly saying exactly what we want the AI to do every time.

Stuart isn't just dissatisfied with the current model though, he has a specific solution. According to him we need to redesign AI around 3 principles:

1. The AI system's objective is to achieve what humans want.
2. But the system isn't sure what we want.
3. And it figures out what we want by observing our behaviour.
Stuart thinks this design architecture, if implemented, would be a big step forward towards reliably beneficial AI.

For instance, a machine built on these principles would be happy to be turned off if that's what its owner thought was best, while one built on the standard model should resist being turned off because being deactivated prevents it from achieving its goal. As Stuart says, "you can't fetch the coffee if you're dead."

These principles lend themselves towards machines that are modest and cautious, and check in when they aren't confident they're truly achieving what we want.

We've made progress toward putting these principles into practice, but the remaining engineering problems are substantial. Among other things, the resulting AIs need to be able to interpret what people really mean to say based on the context of a situation. And they need to guess when we've rejected an option because we've considered it and decided it's a bad idea, and when we simply haven't thought about it at all.

Stuart thinks all of these problems are surmountable, if we put in the work. The harder problems may end up being social and political.

When each of us can have an AI of our own — one smarter than any person — how do we resolve conflicts between people and their AI agents? And if AIs end up doing most work that people do today, how can humans avoid becoming enfeebled, like lazy children tended to by machines, but not intellectually developed enough to know what they really want?

Chapters:

  • Rob’s intro (00:00:00)
  • The interview begins (00:19:06)
  • Human Compatible: Artificial Intelligence and the Problem of Control (00:21:27)
  • Principles for Beneficial Machines (00:29:25)
  • AI moral rights (00:33:05)
  • Humble machines (00:39:35)
  • Learning to predict human preferences (00:45:55)
  • Animals and AI (00:49:33)
  • Enfeeblement problem (00:58:21)
  • Counterarguments (01:07:09)
  • Orthogonality thesis (01:24:25)
  • Intelligence explosion (01:29:15)
  • Policy ideas (01:38:39)
  • What most needs to be done (01:50:14)

Producer: Keiran Harris.
Audio mastering: Ben Cordell.
Transcriptions: Zakee Ulhaq.

Avsnitt(317)

Why 'Aligned AI' Would Still Kill Democracy | David Duvenaud, ex-Anthropic team lead

Why 'Aligned AI' Would Still Kill Democracy | David Duvenaud, ex-Anthropic team lead

Democracy might be a brief historical blip. That’s the unsettling thesis of a recent paper, which argues AI that can do all the work a human can do inevitably leads to the “gradual disempowerment” of ...

27 Jan 2h 31min

#145 Classic episode – Christopher Brown on why slavery abolition wasn't inevitable

#145 Classic episode – Christopher Brown on why slavery abolition wasn't inevitable

In many ways, humanity seems to have become more humane and inclusive over time. While there’s still a lot of progress to be made, campaigns to give people of different genders, races, sexualities, et...

20 Jan 2h 56min

#233 – James Smith on how to prevent a mirror life catastrophe

#233 – James Smith on how to prevent a mirror life catastrophe

When James Smith first heard about mirror bacteria, he was sceptical. But within two weeks, he’d dropped everything to work on it full time, considering it the worst biothreat that he’d seen described...

13 Jan 2h 9min

#144 Classic episode – Athena Aktipis on why cancer is a fundamental universal phenomena

#144 Classic episode – Athena Aktipis on why cancer is a fundamental universal phenomena

What’s the opposite of cancer? If you answered “cure,” “antidote,” or “antivenom” — you’ve obviously been reading the antonym section at www.merriam-webster.com/thesaurus/cancer.But today’s guest Athe...

9 Jan 3h 30min

#142 Classic episode – John McWhorter on why the optimal number of languages might be one, and other provocative claims about language

#142 Classic episode – John McWhorter on why the optimal number of languages might be one, and other provocative claims about language

John McWhorter is a linguistics professor at Columbia University specialising in research on creole languages. He's also a content-producing machine, never afraid to give his frank opinion on anything...

6 Jan 1h 35min

2025 Highlight-o-thon: Oops! All Bests

2025 Highlight-o-thon: Oops! All Bests

It’s that magical time of year once again — highlightapalooza! Stick around for one top bit from each episode we recorded this year, including:Kyle Fish explaining how Anthropic’s AI Claude descends i...

29 Dec 20251h 40min

#232 – Andreas Mogensen on what we owe 'philosophical Vulcans' and unconscious beings

#232 – Andreas Mogensen on what we owe 'philosophical Vulcans' and unconscious beings

Most debates about the moral status of AI systems circle the same question: is there something that it feels like to be them? But what if that’s the wrong question to ask? Andreas Mogensen — a senior ...

19 Dec 20252h 37min

#231 – Paul Scharre on how AI-controlled robots will and won't change war

#231 – Paul Scharre on how AI-controlled robots will and won't change war

In 1983, Stanislav Petrov, a Soviet lieutenant colonel, sat in a bunker watching a red screen flash “MISSILE LAUNCH.” Protocol demanded he report it to superiors, which would very likely trigger a ret...

17 Dec 20252h 45min

Populärt inom Utbildning

rss-bara-en-till-om-missbruk-medberoende-2
historiepodden-se
det-skaver
alska-oss
nu-blir-det-historia
allt-du-velat-veta
harrisons-dramatiska-historia
johannes-hansen-podcast
not-fanny-anymore
sektledare
rss-sjalsligt-avkladd
rss-viktmedicinpodden
rikatillsammans-om-privatekonomi-rikedom-i-livet
sa-in-i-sjalen
rss-max-tant-med-max-villman
i-vantan-pa-katastrofen
rss-om-vi-ska-vara-arliga
psykologsnack
roda-vita-rosen
rss-i-skenet-av-blaljus