#197 – Nick Joseph on whether Anthropic's AI safety policy is up to the task

#197 – Nick Joseph on whether Anthropic's AI safety policy is up to the task

The three biggest AI companies — Anthropic, OpenAI, and DeepMind — have now all released policies designed to make their AI models less likely to go rogue or cause catastrophic damage as they approach, and eventually exceed, human capabilities. Are they good enough?

That’s what host Rob Wiblin tries to hash out in this interview (recorded May 30) with Nick Joseph — one of the original cofounders of Anthropic, its current head of training, and a big fan of Anthropic’s “responsible scaling policy” (or “RSP”). Anthropic is the most safety focused of the AI companies, known for a culture that treats the risks of its work as deadly serious.

Links to learn more, highlights, video, and full transcript.

As Nick explains, these scaling policies commit companies to dig into what new dangerous things a model can do — after it’s trained, but before it’s in wide use. The companies then promise to put in place safeguards they think are sufficient to tackle those capabilities before availability is extended further. For instance, if a model could significantly help design a deadly bioweapon, then its weights need to be properly secured so they can’t be stolen by terrorists interested in using it that way.

As capabilities grow further — for example, if testing shows that a model could exfiltrate itself and spread autonomously in the wild — then new measures would need to be put in place to make that impossible, or demonstrate that such a goal can never arise.

Nick points out what he sees as the biggest virtues of the RSP approach, and then Rob pushes him on some of the best objections he’s found to RSPs being up to the task of keeping AI safe and beneficial. The two also discuss whether it's essential to eventually hand over operation of responsible scaling policies to external auditors or regulatory bodies, if those policies are going to be able to hold up against the intense commercial pressures that might end up arrayed against them.

In addition to all of that, Nick and Rob talk about:

  • What Nick thinks are the current bottlenecks in AI progress: people and time (rather than data or compute).
  • What it’s like working in AI safety research at the leading edge, and whether pushing forward capabilities (even in the name of safety) is a good idea.
  • What it’s like working at Anthropic, and how to get the skills needed to help with the safe development of AI.

And as a reminder, if you want to let us know your reaction to this interview, or send any other feedback, our inbox is always open at podcast@80000hours.org.

Chapters:

  • Cold open (00:00:00)
  • Rob’s intro (00:01:00)
  • The interview begins (00:03:44)
  • Scaling laws (00:04:12)
  • Bottlenecks to further progress in making AIs helpful (00:08:36)
  • Anthropic’s responsible scaling policies (00:14:21)
  • Pros and cons of the RSP approach for AI safety (00:34:09)
  • Alternatives to RSPs (00:46:44)
  • Is an internal audit really the best approach? (00:51:56)
  • Making promises about things that are currently technically impossible (01:07:54)
  • Nick’s biggest reservations about the RSP approach (01:16:05)
  • Communicating “acceptable” risk (01:19:27)
  • Should Anthropic’s RSP have wider safety buffers? (01:26:13)
  • Other impacts on society and future work on RSPs (01:34:01)
  • Working at Anthropic (01:36:28)
  • Engineering vs research (01:41:04)
  • AI safety roles at Anthropic (01:48:31)
  • Should concerned people be willing to take capabilities roles? (01:58:20)
  • Recent safety work at Anthropic (02:10:05)
  • Anthropic culture (02:14:35)
  • Overrated and underrated AI applications (02:22:06)
  • Rob’s outro (02:26:36)

Producer and editor: Keiran Harris
Audio engineering by Ben Cordell, Milo McGuire, Simon Monsour, and Dominic Armstrong
Video engineering: Simon Monsour
Transcriptions: Katy Moore

Jaksot(332)

Parenting insights from Rob and 8 past guests

Parenting insights from Rob and 8 past guests

With kids very much on the team's mind we thought it would be fun to review some comments about parenting featured on the show over the years, then have hosts Luisa Rodriguez and Rob Wiblin react to t...

8 Marras 20241h 35min

#206 – Anil Seth on the predictive brain and how to study consciousness

#206 – Anil Seth on the predictive brain and how to study consciousness

"In that famous example of the dress, half of the people in the world saw [blue and black], half saw [white and gold]. It turns out there’s individual differences in how brains take into account ambie...

1 Marras 20242h 33min

How much does a vote matter? (Article)

How much does a vote matter? (Article)

If you care about social impact, is voting important? In this piece, Rob investigates the two key things that determine the impact of your vote:The chances of your vote changing an election’s outcome....

28 Loka 202432min

#205 – Sébastien Moro on the most insane things fish can do

#205 – Sébastien Moro on the most insane things fish can do

"You have a tank split in two parts: if the fish gets in the compartment with a red circle, it will receive food, and food will be delivered in the other tank as well. If the fish takes the blue trian...

23 Loka 20243h 11min

#204 – Nate Silver on making sense of SBF, and his biggest critiques of effective altruism

#204 – Nate Silver on making sense of SBF, and his biggest critiques of effective altruism

Rob Wiblin speaks with FiveThirtyEight election forecaster and author Nate Silver about his new book: On the Edge: The Art of Risking Everything.Links to learn more, highlights, video, and full transc...

16 Loka 20241h 57min

#203 – Peter Godfrey-Smith on interfering with wild nature, accepting death, and the origin of complex civilisation

#203 – Peter Godfrey-Smith on interfering with wild nature, accepting death, and the origin of complex civilisation

"In the human case, it would be mistaken to give a kind of hour-by-hour accounting. You know, 'I had +4 level of experience for this hour, then I had -2 for the next hour, and then I had -1' — and you...

3 Loka 20241h 25min

Luisa and Keiran on free will, and the consequences of never feeling enduring guilt or shame

Luisa and Keiran on free will, and the consequences of never feeling enduring guilt or shame

In this episode from our second show, 80k After Hours, Luisa Rodriguez and Keiran Harris chat about the consequences of letting go of enduring guilt, shame, anger, and pride.Links to learn more, highl...

27 Syys 20241h 36min

#202 – Venki Ramakrishnan on the cutting edge of anti-ageing science

#202 – Venki Ramakrishnan on the cutting edge of anti-ageing science

"For every far-out idea that turns out to be true, there were probably hundreds that were simply crackpot ideas. In general, [science] advances building on the knowledge we have, and seeing what the n...

19 Syys 20242h 20min

Suosittua kategoriassa Koulutus

rss-murhan-anatomia
psykopodiaa-podcast
voi-hyvin-meditaatiot-2
adhd-podi
psykologia
rss-tietoinen-yhteys-podcast-2
rss-valo-minussa-2
rss-rahamania
rss-niinku-asia-on
kesken
rss-liian-kuuma-peruna
rss-arkea-ja-aurinkoa-podcast-espanjasta
rss-vapaudu-voimaasi
rahapuhetta
dear-ladies
rss-uskonto-on-tylsaa
rss-narsisti
rss-hereilla
koodikahvit
aamukahvilla