#197 – Nick Joseph on whether Anthropic's AI safety policy is up to the task

#197 – Nick Joseph on whether Anthropic's AI safety policy is up to the task

The three biggest AI companies — Anthropic, OpenAI, and DeepMind — have now all released policies designed to make their AI models less likely to go rogue or cause catastrophic damage as they approach, and eventually exceed, human capabilities. Are they good enough?

That’s what host Rob Wiblin tries to hash out in this interview (recorded May 30) with Nick Joseph — one of the original cofounders of Anthropic, its current head of training, and a big fan of Anthropic’s “responsible scaling policy” (or “RSP”). Anthropic is the most safety focused of the AI companies, known for a culture that treats the risks of its work as deadly serious.

Links to learn more, highlights, video, and full transcript.

As Nick explains, these scaling policies commit companies to dig into what new dangerous things a model can do — after it’s trained, but before it’s in wide use. The companies then promise to put in place safeguards they think are sufficient to tackle those capabilities before availability is extended further. For instance, if a model could significantly help design a deadly bioweapon, then its weights need to be properly secured so they can’t be stolen by terrorists interested in using it that way.

As capabilities grow further — for example, if testing shows that a model could exfiltrate itself and spread autonomously in the wild — then new measures would need to be put in place to make that impossible, or demonstrate that such a goal can never arise.

Nick points out what he sees as the biggest virtues of the RSP approach, and then Rob pushes him on some of the best objections he’s found to RSPs being up to the task of keeping AI safe and beneficial. The two also discuss whether it's essential to eventually hand over operation of responsible scaling policies to external auditors or regulatory bodies, if those policies are going to be able to hold up against the intense commercial pressures that might end up arrayed against them.

In addition to all of that, Nick and Rob talk about:

  • What Nick thinks are the current bottlenecks in AI progress: people and time (rather than data or compute).
  • What it’s like working in AI safety research at the leading edge, and whether pushing forward capabilities (even in the name of safety) is a good idea.
  • What it’s like working at Anthropic, and how to get the skills needed to help with the safe development of AI.

And as a reminder, if you want to let us know your reaction to this interview, or send any other feedback, our inbox is always open at podcast@80000hours.org.

Chapters:

  • Cold open (00:00:00)
  • Rob’s intro (00:01:00)
  • The interview begins (00:03:44)
  • Scaling laws (00:04:12)
  • Bottlenecks to further progress in making AIs helpful (00:08:36)
  • Anthropic’s responsible scaling policies (00:14:21)
  • Pros and cons of the RSP approach for AI safety (00:34:09)
  • Alternatives to RSPs (00:46:44)
  • Is an internal audit really the best approach? (00:51:56)
  • Making promises about things that are currently technically impossible (01:07:54)
  • Nick’s biggest reservations about the RSP approach (01:16:05)
  • Communicating “acceptable” risk (01:19:27)
  • Should Anthropic’s RSP have wider safety buffers? (01:26:13)
  • Other impacts on society and future work on RSPs (01:34:01)
  • Working at Anthropic (01:36:28)
  • Engineering vs research (01:41:04)
  • AI safety roles at Anthropic (01:48:31)
  • Should concerned people be willing to take capabilities roles? (01:58:20)
  • Recent safety work at Anthropic (02:10:05)
  • Anthropic culture (02:14:35)
  • Overrated and underrated AI applications (02:22:06)
  • Rob’s outro (02:26:36)

Producer and editor: Keiran Harris
Audio engineering by Ben Cordell, Milo McGuire, Simon Monsour, and Dominic Armstrong
Video engineering: Simon Monsour
Transcriptions: Katy Moore

Jaksot(325)

#35 - Tara Mac Aulay on the audacity to fix the world without asking permission

#35 - Tara Mac Aulay on the audacity to fix the world without asking permission

"You don't need permission. You don't need to be allowed to do something that's not in your job description. If you think that it's gonna make your company or your organization more successful and mor...

21 Kesä 20181h 22min

Rob Wiblin on the art/science of a high impact career

Rob Wiblin on the art/science of a high impact career

Today's episode is a cross-post of an interview I did with The Jolly Swagmen Podcast which came out this week. I recommend regular listeners skip to 24 minutes in to avoid hearing things they already ...

8 Kesä 20181h 31min

#34 - We use the worst voting system that exists. Here's how Aaron Hamlin is going to fix it.

#34 - We use the worst voting system that exists. Here's how Aaron Hamlin is going to fix it.

In 1991 Edwin Edwards won the Louisiana gubernatorial election. In 2001, he was found guilty of racketeering and received a 10 year invitation to Federal prison. The strange thing about that election?...

1 Kesä 20182h 18min

#33 - Anders Sandberg on what if we ended ageing, solar flares & the annual risk of nuclear war

#33 - Anders Sandberg on what if we ended ageing, solar flares & the annual risk of nuclear war

Joseph Stalin had a life-extension program dedicated to making himself immortal. What if he had succeeded?  According to our last guest, Bryan Caplan, there’s an 80% chance that Stalin would still be ...

29 Touko 20181h 24min

#32 - Bryan Caplan on whether his Case Against Education holds up, totalitarianism, & open borders

#32 - Bryan Caplan on whether his Case Against Education holds up, totalitarianism, & open borders

Bryan Caplan’s claim in *The Case Against Education* is striking: education doesn’t teach people much, we use little of what we learn, and college is mostly about trying to seem smarter than other peo...

22 Touko 20182h 25min

#31 - Allan Dafoe on defusing the political & economic risks posed by existing AI capabilities

#31 - Allan Dafoe on defusing the political & economic risks posed by existing AI capabilities

The debate around the impacts of artificial intelligence often centres on ‘superintelligence’ - a general intellect that is much smarter than the best humans, in practically every field. But according...

18 Touko 201848min

#30 - Eva Vivalt on how little social science findings generalize from one study to another

#30 - Eva Vivalt on how little social science findings generalize from one study to another

If we have a study on the impact of a social program in a particular place and time, how confident can we be that we’ll get a similar result if we study the same program again somewhere else? Dr Eva V...

15 Touko 20182h 1min

#29 - Anders Sandberg on 3 new resolutions for the Fermi paradox & how to colonise the universe

#29 - Anders Sandberg on 3 new resolutions for the Fermi paradox & how to colonise the universe

Part 2 out now: #33 - Dr Anders Sandberg on what if we ended ageing, solar flares & the annual risk of nuclear war The universe is so vast, yet we don’t see any alien civilizations. If they exist, whe...

8 Touko 20181h 21min

Suosittua kategoriassa Koulutus

rss-murhan-anatomia
voi-hyvin-meditaatiot-2
rss-narsisti
psykopodiaa-podcast
adhd-podi
rss-rahamania
rss-valo-minussa-2
rss-vapaudu-voimaasi
rss-niinku-asia-on
mielipaivakirja
rss-uskonto-on-tylsaa
aamukahvilla
rss-duodecim-lehti
ilona-rauhala
kesken
psykologia
rss-eron-alkemiaa
rss-koira-haudattuna
rss-arkea-ja-aurinkoa-podcast-espanjasta
ihminen-tavattavissa-tommy-hellsten-instituutti