#197 – Nick Joseph on whether Anthropic's AI safety policy is up to the task

#197 – Nick Joseph on whether Anthropic's AI safety policy is up to the task

The three biggest AI companies — Anthropic, OpenAI, and DeepMind — have now all released policies designed to make their AI models less likely to go rogue or cause catastrophic damage as they approach, and eventually exceed, human capabilities. Are they good enough?

That’s what host Rob Wiblin tries to hash out in this interview (recorded May 30) with Nick Joseph — one of the original cofounders of Anthropic, its current head of training, and a big fan of Anthropic’s “responsible scaling policy” (or “RSP”). Anthropic is the most safety focused of the AI companies, known for a culture that treats the risks of its work as deadly serious.

Links to learn more, highlights, video, and full transcript.

As Nick explains, these scaling policies commit companies to dig into what new dangerous things a model can do — after it’s trained, but before it’s in wide use. The companies then promise to put in place safeguards they think are sufficient to tackle those capabilities before availability is extended further. For instance, if a model could significantly help design a deadly bioweapon, then its weights need to be properly secured so they can’t be stolen by terrorists interested in using it that way.

As capabilities grow further — for example, if testing shows that a model could exfiltrate itself and spread autonomously in the wild — then new measures would need to be put in place to make that impossible, or demonstrate that such a goal can never arise.

Nick points out what he sees as the biggest virtues of the RSP approach, and then Rob pushes him on some of the best objections he’s found to RSPs being up to the task of keeping AI safe and beneficial. The two also discuss whether it's essential to eventually hand over operation of responsible scaling policies to external auditors or regulatory bodies, if those policies are going to be able to hold up against the intense commercial pressures that might end up arrayed against them.

In addition to all of that, Nick and Rob talk about:

  • What Nick thinks are the current bottlenecks in AI progress: people and time (rather than data or compute).
  • What it’s like working in AI safety research at the leading edge, and whether pushing forward capabilities (even in the name of safety) is a good idea.
  • What it’s like working at Anthropic, and how to get the skills needed to help with the safe development of AI.

And as a reminder, if you want to let us know your reaction to this interview, or send any other feedback, our inbox is always open at podcast@80000hours.org.

Chapters:

  • Cold open (00:00:00)
  • Rob’s intro (00:01:00)
  • The interview begins (00:03:44)
  • Scaling laws (00:04:12)
  • Bottlenecks to further progress in making AIs helpful (00:08:36)
  • Anthropic’s responsible scaling policies (00:14:21)
  • Pros and cons of the RSP approach for AI safety (00:34:09)
  • Alternatives to RSPs (00:46:44)
  • Is an internal audit really the best approach? (00:51:56)
  • Making promises about things that are currently technically impossible (01:07:54)
  • Nick’s biggest reservations about the RSP approach (01:16:05)
  • Communicating “acceptable” risk (01:19:27)
  • Should Anthropic’s RSP have wider safety buffers? (01:26:13)
  • Other impacts on society and future work on RSPs (01:34:01)
  • Working at Anthropic (01:36:28)
  • Engineering vs research (01:41:04)
  • AI safety roles at Anthropic (01:48:31)
  • Should concerned people be willing to take capabilities roles? (01:58:20)
  • Recent safety work at Anthropic (02:10:05)
  • Anthropic culture (02:14:35)
  • Overrated and underrated AI applications (02:22:06)
  • Rob’s outro (02:26:36)

Producer and editor: Keiran Harris
Audio engineering by Ben Cordell, Milo McGuire, Simon Monsour, and Dominic Armstrong
Video engineering: Simon Monsour
Transcriptions: Katy Moore

Jaksot(325)

#180 – Hugo Mercier on why gullibility and misinformation are overrated

#180 – Hugo Mercier on why gullibility and misinformation are overrated

The World Economic Forum’s global risks survey of 1,400 experts, policymakers, and industry leaders ranked misinformation and disinformation as the number one global risk over the next two years — ran...

21 Helmi 20242h 36min

#179 – Randy Nesse on why evolution left us so vulnerable to depression and anxiety

#179 – Randy Nesse on why evolution left us so vulnerable to depression and anxiety

Mental health problems like depression and anxiety affect enormous numbers of people and severely interfere with their lives. By contrast, we don’t see similar levels of physical ill health in young p...

12 Helmi 20242h 56min

#178 – Emily Oster on what the evidence actually says about pregnancy and parenting

#178 – Emily Oster on what the evidence actually says about pregnancy and parenting

"I think at various times — before you have the kid, after you have the kid — it's useful to sit down and think about: What do I want the shape of this to look like? What time do I want to be spending...

1 Helmi 20242h 22min

#177 – Nathan Labenz on recent AI breakthroughs and navigating the growing rift between AI safety and accelerationist camps

#177 – Nathan Labenz on recent AI breakthroughs and navigating the growing rift between AI safety and accelerationist camps

Back in December we spoke with Nathan Labenz — AI entrepreneur and host of The Cognitive Revolution Podcast — about the speed of progress towards AGI and OpenAI's leadership drama, drawing on Nathan's...

24 Tammi 20242h 47min

#90 Classic episode – Ajeya Cotra on worldview diversification and how big the future could be

#90 Classic episode – Ajeya Cotra on worldview diversification and how big the future could be

You wake up in a mysterious box, and hear the booming voice of God: “I just flipped a coin. If it came up heads, I made ten boxes, labeled 1 through 10 — each of which has a human in it. If it came up...

12 Tammi 20242h 59min

#112 Classic episode – Carl Shulman on the common-sense case for existential risk work and its practical implications

#112 Classic episode – Carl Shulman on the common-sense case for existential risk work and its practical implications

Preventing the apocalypse may sound like an idiosyncratic activity, and it sometimes is justified on exotic grounds, such as the potential for humanity to become a galaxy-spanning civilisation.But the...

8 Tammi 20243h 50min

#111 Classic episode – Mushtaq Khan on using institutional economics to predict effective government reforms

#111 Classic episode – Mushtaq Khan on using institutional economics to predict effective government reforms

If you’re living in the Niger Delta in Nigeria, your best bet at a high-paying career is probably ‘artisanal refining’ — or, in plain language, stealing oil from pipelines.The resulting oil spills dam...

4 Tammi 20243h 22min

2023 Mega-highlights Extravaganza

2023 Mega-highlights Extravaganza

Happy new year! We've got a different kind of holiday release for you today. Rather than a 'classic episode,' we've put together one of our favourite highlights from each episode of the show that came...

31 Joulu 20231h 53min

Suosittua kategoriassa Koulutus

rss-murhan-anatomia
voi-hyvin-meditaatiot-2
rss-narsisti
adhd-podi
psykopodiaa-podcast
rss-uskonto-on-tylsaa
rss-rahamania
rss-duodecim-lehti
rss-valo-minussa-2
rss-vapaudu-voimaasi
rss-liian-kuuma-peruna
rahapuhetta
rss-niinku-asia-on
aloita-meditaatio
kesken
dear-ladies
mielipaivakirja
rss-eron-alkemiaa
rss-tietoinen-yhteys-podcast-2
aamukahvilla