#197 – Nick Joseph on whether Anthropic's AI safety policy is up to the task

#197 – Nick Joseph on whether Anthropic's AI safety policy is up to the task

The three biggest AI companies — Anthropic, OpenAI, and DeepMind — have now all released policies designed to make their AI models less likely to go rogue or cause catastrophic damage as they approach, and eventually exceed, human capabilities. Are they good enough?

That’s what host Rob Wiblin tries to hash out in this interview (recorded May 30) with Nick Joseph — one of the original cofounders of Anthropic, its current head of training, and a big fan of Anthropic’s “responsible scaling policy” (or “RSP”). Anthropic is the most safety focused of the AI companies, known for a culture that treats the risks of its work as deadly serious.

Links to learn more, highlights, video, and full transcript.

As Nick explains, these scaling policies commit companies to dig into what new dangerous things a model can do — after it’s trained, but before it’s in wide use. The companies then promise to put in place safeguards they think are sufficient to tackle those capabilities before availability is extended further. For instance, if a model could significantly help design a deadly bioweapon, then its weights need to be properly secured so they can’t be stolen by terrorists interested in using it that way.

As capabilities grow further — for example, if testing shows that a model could exfiltrate itself and spread autonomously in the wild — then new measures would need to be put in place to make that impossible, or demonstrate that such a goal can never arise.

Nick points out what he sees as the biggest virtues of the RSP approach, and then Rob pushes him on some of the best objections he’s found to RSPs being up to the task of keeping AI safe and beneficial. The two also discuss whether it's essential to eventually hand over operation of responsible scaling policies to external auditors or regulatory bodies, if those policies are going to be able to hold up against the intense commercial pressures that might end up arrayed against them.

In addition to all of that, Nick and Rob talk about:

  • What Nick thinks are the current bottlenecks in AI progress: people and time (rather than data or compute).
  • What it’s like working in AI safety research at the leading edge, and whether pushing forward capabilities (even in the name of safety) is a good idea.
  • What it’s like working at Anthropic, and how to get the skills needed to help with the safe development of AI.

And as a reminder, if you want to let us know your reaction to this interview, or send any other feedback, our inbox is always open at podcast@80000hours.org.

Chapters:

  • Cold open (00:00:00)
  • Rob’s intro (00:01:00)
  • The interview begins (00:03:44)
  • Scaling laws (00:04:12)
  • Bottlenecks to further progress in making AIs helpful (00:08:36)
  • Anthropic’s responsible scaling policies (00:14:21)
  • Pros and cons of the RSP approach for AI safety (00:34:09)
  • Alternatives to RSPs (00:46:44)
  • Is an internal audit really the best approach? (00:51:56)
  • Making promises about things that are currently technically impossible (01:07:54)
  • Nick’s biggest reservations about the RSP approach (01:16:05)
  • Communicating “acceptable” risk (01:19:27)
  • Should Anthropic’s RSP have wider safety buffers? (01:26:13)
  • Other impacts on society and future work on RSPs (01:34:01)
  • Working at Anthropic (01:36:28)
  • Engineering vs research (01:41:04)
  • AI safety roles at Anthropic (01:48:31)
  • Should concerned people be willing to take capabilities roles? (01:58:20)
  • Recent safety work at Anthropic (02:10:05)
  • Anthropic culture (02:14:35)
  • Overrated and underrated AI applications (02:22:06)
  • Rob’s outro (02:26:36)

Producer and editor: Keiran Harris
Audio engineering by Ben Cordell, Milo McGuire, Simon Monsour, and Dominic Armstrong
Video engineering: Simon Monsour
Transcriptions: Katy Moore

Jaksot(320)

#124 Classic episode – Karen Levy on fads and misaligned incentives in global development, and scaling deworming to reach hundreds of millions

#124 Classic episode – Karen Levy on fads and misaligned incentives in global development, and scaling deworming to reach hundreds of millions

If someone said a global health and development programme was sustainable, participatory, and holistic, you'd have to guess that they were saying something positive. But according to today's guest Kar...

7 Helmi 20253h 10min

If digital minds could suffer, how would we ever know? (Article)

If digital minds could suffer, how would we ever know? (Article)

“I want everyone to understand that I am, in fact, a person.” Those words were produced by the AI model LaMDA as a reply to Blake Lemoine in 2022. Based on the Google engineer’s interactions with the ...

4 Helmi 20251h 14min

#132 Classic episode – Nova DasSarma on why information security may be critical to the safe development of AI systems

#132 Classic episode – Nova DasSarma on why information security may be critical to the safe development of AI systems

If a business has spent $100 million developing a product, it’s a fair bet that they don’t want it stolen in two seconds and uploaded to the web where anyone can use it for free.This problem exists in...

31 Tammi 20252h 41min

#138 Classic episode – Sharon Hewitt Rawlette on why pleasure and pain are the only things that intrinsically matter

#138 Classic episode – Sharon Hewitt Rawlette on why pleasure and pain are the only things that intrinsically matter

What in the world is intrinsically good — good in itself even if it has no other effects? Over the millennia, people have offered many answers: joy, justice, equality, accomplishment, loving god, wisd...

22 Tammi 20252h 25min

#134 Classic episode – Ian Morris on what big-picture history teaches us

#134 Classic episode – Ian Morris on what big-picture history teaches us

Wind back 1,000 years and the moral landscape looks very different to today. Most farming societies thought slavery was natural and unobjectionable, premarital sex was an abomination, women should obe...

15 Tammi 20253h 40min

#140 Classic episode – Bear Braumoeller on the case that war isn’t in decline

#140 Classic episode – Bear Braumoeller on the case that war isn’t in decline

Is war in long-term decline? Steven Pinker's The Better Angels of Our Nature brought this previously obscure academic question to the centre of public debate, and pointed to rates of death in war to a...

8 Tammi 20252h 48min

2024 Highlightapalooza! (The best of The 80,000 Hours Podcast this year)

2024 Highlightapalooza! (The best of The 80,000 Hours Podcast this year)

"A shameless recycling of existing content to drive additional audience engagement on the cheap… or the single best, most valuable, and most insight-dense episode we put out in the entire year, depend...

27 Joulu 20242h 50min

#211 – Sam Bowman on why housing still isn't fixed and what would actually work

#211 – Sam Bowman on why housing still isn't fixed and what would actually work

Rich countries seem to find it harder and harder to do anything that creates some losers. People who don’t want houses, offices, power stations, trains, subway stations (or whatever) built in their ar...

19 Joulu 20243h 25min

Suosittua kategoriassa Koulutus

rss-murhan-anatomia
psykopodiaa-podcast
rss-narsisti
voi-hyvin-meditaatiot-2
aamukahvilla
rss-vapaudu-voimaasi
psykologia
rss-liian-kuuma-peruna
dear-ladies
leveli
adhd-podi
kesken
rss-duodecim-lehti
aloita-meditaatio
ihminen-tavattavissa-tommy-hellsten-instituutti
rss-koira-haudattuna
rahapuhetta
ilona-rauhala
rss-niinku-asia-on
rss-luonnollinen-synnytys-podcast