#197 – Nick Joseph on whether Anthropic's AI safety policy is up to the task

#197 – Nick Joseph on whether Anthropic's AI safety policy is up to the task

The three biggest AI companies — Anthropic, OpenAI, and DeepMind — have now all released policies designed to make their AI models less likely to go rogue or cause catastrophic damage as they approach, and eventually exceed, human capabilities. Are they good enough?

That’s what host Rob Wiblin tries to hash out in this interview (recorded May 30) with Nick Joseph — one of the original cofounders of Anthropic, its current head of training, and a big fan of Anthropic’s “responsible scaling policy” (or “RSP”). Anthropic is the most safety focused of the AI companies, known for a culture that treats the risks of its work as deadly serious.

Links to learn more, highlights, video, and full transcript.

As Nick explains, these scaling policies commit companies to dig into what new dangerous things a model can do — after it’s trained, but before it’s in wide use. The companies then promise to put in place safeguards they think are sufficient to tackle those capabilities before availability is extended further. For instance, if a model could significantly help design a deadly bioweapon, then its weights need to be properly secured so they can’t be stolen by terrorists interested in using it that way.

As capabilities grow further — for example, if testing shows that a model could exfiltrate itself and spread autonomously in the wild — then new measures would need to be put in place to make that impossible, or demonstrate that such a goal can never arise.

Nick points out what he sees as the biggest virtues of the RSP approach, and then Rob pushes him on some of the best objections he’s found to RSPs being up to the task of keeping AI safe and beneficial. The two also discuss whether it's essential to eventually hand over operation of responsible scaling policies to external auditors or regulatory bodies, if those policies are going to be able to hold up against the intense commercial pressures that might end up arrayed against them.

In addition to all of that, Nick and Rob talk about:

  • What Nick thinks are the current bottlenecks in AI progress: people and time (rather than data or compute).
  • What it’s like working in AI safety research at the leading edge, and whether pushing forward capabilities (even in the name of safety) is a good idea.
  • What it’s like working at Anthropic, and how to get the skills needed to help with the safe development of AI.

And as a reminder, if you want to let us know your reaction to this interview, or send any other feedback, our inbox is always open at podcast@80000hours.org.

Chapters:

  • Cold open (00:00:00)
  • Rob’s intro (00:01:00)
  • The interview begins (00:03:44)
  • Scaling laws (00:04:12)
  • Bottlenecks to further progress in making AIs helpful (00:08:36)
  • Anthropic’s responsible scaling policies (00:14:21)
  • Pros and cons of the RSP approach for AI safety (00:34:09)
  • Alternatives to RSPs (00:46:44)
  • Is an internal audit really the best approach? (00:51:56)
  • Making promises about things that are currently technically impossible (01:07:54)
  • Nick’s biggest reservations about the RSP approach (01:16:05)
  • Communicating “acceptable” risk (01:19:27)
  • Should Anthropic’s RSP have wider safety buffers? (01:26:13)
  • Other impacts on society and future work on RSPs (01:34:01)
  • Working at Anthropic (01:36:28)
  • Engineering vs research (01:41:04)
  • AI safety roles at Anthropic (01:48:31)
  • Should concerned people be willing to take capabilities roles? (01:58:20)
  • Recent safety work at Anthropic (02:10:05)
  • Anthropic culture (02:14:35)
  • Overrated and underrated AI applications (02:22:06)
  • Rob’s outro (02:26:36)

Producer and editor: Keiran Harris
Audio engineering by Ben Cordell, Milo McGuire, Simon Monsour, and Dominic Armstrong
Video engineering: Simon Monsour
Transcriptions: Katy Moore

Avsnitt(324)

#224 – There's a cheap and low-tech way to save humanity from any engineered disease | Andrew Snyder-Beattie

#224 – There's a cheap and low-tech way to save humanity from any engineered disease | Andrew Snyder-Beattie

Conventional wisdom is that safeguarding humanity from the worst biological risks — microbes optimised to kill as many as possible — is difficult bordering on impossible, making bioweapons humanity’s ...

2 Okt 20252h 31min

Inside the Biden admin’s AI policy approach | Jake Sullivan, Biden’s NSA | via The Cognitive Revolution

Inside the Biden admin’s AI policy approach | Jake Sullivan, Biden’s NSA | via The Cognitive Revolution

Jake Sullivan was the US National Security Advisor from 2021-2025. He joined our friends on The Cognitive Revolution podcast in August to discuss AI as a critical national security issue. We thought i...

26 Sep 20251h 5min

#223 – Neel Nanda on leading a Google DeepMind team at 26 – and advice if you want to work at an AI company (part 2)

#223 – Neel Nanda on leading a Google DeepMind team at 26 – and advice if you want to work at an AI company (part 2)

At 26, Neel Nanda leads an AI safety team at Google DeepMind, has published dozens of influential papers, and mentored 50 junior researchers — seven of whom now work at major AI companies. His secret?...

15 Sep 20251h 46min

#222 – Can we tell if an AI is loyal by reading its mind? DeepMind's Neel Nanda (part 1)

#222 – Can we tell if an AI is loyal by reading its mind? DeepMind's Neel Nanda (part 1)

We don’t know how AIs think or why they do what they do. Or at least, we don’t know much. That fact is only becoming more troubling as AIs grow more capable and appear on track to wield enormous cultu...

8 Sep 20253h 1min

#221 – Kyle Fish on the most bizarre findings from 5 AI welfare experiments

#221 – Kyle Fish on the most bizarre findings from 5 AI welfare experiments

What happens when you lock two AI systems in a room together and tell them they can discuss anything they want?According to experiments run by Kyle Fish — Anthropic’s first AI welfare researcher — som...

28 Aug 20252h 28min

How not to lose your job to AI (article by Benjamin Todd)

How not to lose your job to AI (article by Benjamin Todd)

About half of people are worried they’ll lose their job to AI. They’re right to be concerned: AI can now complete real-world coding tasks on GitHub, generate photorealistic video, drive a taxi more sa...

31 Juli 202551min

Rebuilding after apocalypse: What 13 experts say about bouncing back

Rebuilding after apocalypse: What 13 experts say about bouncing back

What happens when civilisation faces its greatest tests?This compilation brings together insights from researchers, defence experts, philosophers, and policymakers on humanity’s ability to survive and...

15 Juli 20254h 26min

#220 – Ryan Greenblatt on the 4 most likely ways for AI to take over, and the case for and against AGI in <8 years

#220 – Ryan Greenblatt on the 4 most likely ways for AI to take over, and the case for and against AGI in <8 years

Ryan Greenblatt — lead author on the explosive paper “Alignment faking in large language models” and chief scientist at Redwood Research — thinks there’s a 25% chance that within four years, AI will b...

8 Juli 20252h 50min

Populärt inom Utbildning

historiepodden-se
rss-bara-en-till-om-missbruk-medberoende-2
det-skaver
alska-oss
nu-blir-det-historia
rss-viktmedicinpodden
harrisons-dramatiska-historia
sektledare
johannes-hansen-podcast
roda-vita-rosen
not-fanny-anymore
allt-du-velat-veta
rss-sjalsligt-avkladd
rss-max-tant-med-max-villman
rikatillsammans-om-privatekonomi-rikedom-i-livet
i-vantan-pa-katastrofen
sa-in-i-sjalen
polisutbildningspodden
rss-om-vi-ska-vara-arliga
rss-basta-livet