#197 – Nick Joseph on whether Anthropic's AI safety policy is up to the task

#197 – Nick Joseph on whether Anthropic's AI safety policy is up to the task

The three biggest AI companies — Anthropic, OpenAI, and DeepMind — have now all released policies designed to make their AI models less likely to go rogue or cause catastrophic damage as they approach, and eventually exceed, human capabilities. Are they good enough?

That’s what host Rob Wiblin tries to hash out in this interview (recorded May 30) with Nick Joseph — one of the original cofounders of Anthropic, its current head of training, and a big fan of Anthropic’s “responsible scaling policy” (or “RSP”). Anthropic is the most safety focused of the AI companies, known for a culture that treats the risks of its work as deadly serious.

Links to learn more, highlights, video, and full transcript.

As Nick explains, these scaling policies commit companies to dig into what new dangerous things a model can do — after it’s trained, but before it’s in wide use. The companies then promise to put in place safeguards they think are sufficient to tackle those capabilities before availability is extended further. For instance, if a model could significantly help design a deadly bioweapon, then its weights need to be properly secured so they can’t be stolen by terrorists interested in using it that way.

As capabilities grow further — for example, if testing shows that a model could exfiltrate itself and spread autonomously in the wild — then new measures would need to be put in place to make that impossible, or demonstrate that such a goal can never arise.

Nick points out what he sees as the biggest virtues of the RSP approach, and then Rob pushes him on some of the best objections he’s found to RSPs being up to the task of keeping AI safe and beneficial. The two also discuss whether it's essential to eventually hand over operation of responsible scaling policies to external auditors or regulatory bodies, if those policies are going to be able to hold up against the intense commercial pressures that might end up arrayed against them.

In addition to all of that, Nick and Rob talk about:

  • What Nick thinks are the current bottlenecks in AI progress: people and time (rather than data or compute).
  • What it’s like working in AI safety research at the leading edge, and whether pushing forward capabilities (even in the name of safety) is a good idea.
  • What it’s like working at Anthropic, and how to get the skills needed to help with the safe development of AI.

And as a reminder, if you want to let us know your reaction to this interview, or send any other feedback, our inbox is always open at podcast@80000hours.org.

Chapters:

  • Cold open (00:00:00)
  • Rob’s intro (00:01:00)
  • The interview begins (00:03:44)
  • Scaling laws (00:04:12)
  • Bottlenecks to further progress in making AIs helpful (00:08:36)
  • Anthropic’s responsible scaling policies (00:14:21)
  • Pros and cons of the RSP approach for AI safety (00:34:09)
  • Alternatives to RSPs (00:46:44)
  • Is an internal audit really the best approach? (00:51:56)
  • Making promises about things that are currently technically impossible (01:07:54)
  • Nick’s biggest reservations about the RSP approach (01:16:05)
  • Communicating “acceptable” risk (01:19:27)
  • Should Anthropic’s RSP have wider safety buffers? (01:26:13)
  • Other impacts on society and future work on RSPs (01:34:01)
  • Working at Anthropic (01:36:28)
  • Engineering vs research (01:41:04)
  • AI safety roles at Anthropic (01:48:31)
  • Should concerned people be willing to take capabilities roles? (01:58:20)
  • Recent safety work at Anthropic (02:10:05)
  • Anthropic culture (02:14:35)
  • Overrated and underrated AI applications (02:22:06)
  • Rob’s outro (02:26:36)

Producer and editor: Keiran Harris
Audio engineering by Ben Cordell, Milo McGuire, Simon Monsour, and Dominic Armstrong
Video engineering: Simon Monsour
Transcriptions: Katy Moore

Jaksot(332)

#176 – Nathan Labenz on the final push for AGI, understanding OpenAI's leadership drama, and red-teaming frontier models

#176 – Nathan Labenz on the final push for AGI, understanding OpenAI's leadership drama, and red-teaming frontier models

OpenAI says its mission is to build AGI — an AI system that is better than human beings at everything. Should the world trust them to do that safely?That’s the central theme of today’s episode with Na...

22 Joulu 20233h 46min

#175 – Lucia Coulter on preventing lead poisoning for $1.66 per child

#175 – Lucia Coulter on preventing lead poisoning for $1.66 per child

Lead is one of the most poisonous things going. A single sugar sachet of lead, spread over a park the size of an American football field, is enough to give a child that regularly plays there lead pois...

14 Joulu 20232h 14min

#174 – Nita Farahany on the neurotechnology already being used to convict criminals and manipulate workers

#174 – Nita Farahany on the neurotechnology already being used to convict criminals and manipulate workers

"It will change everything: it will change our workplaces, it will change our interactions with the government, it will change our interactions with each other. It will make all of us unwitting neurom...

7 Joulu 20232h

#173 – Jeff Sebo on digital minds, and how to avoid sleepwalking into a major moral catastrophe

#173 – Jeff Sebo on digital minds, and how to avoid sleepwalking into a major moral catastrophe

"We do have a tendency to anthropomorphise nonhumans — which means attributing human characteristics to them, even when they lack those characteristics. But we also have a tendency towards anthropoden...

22 Marras 20232h 38min

#172 – Bryan Caplan on why you should stop reading the news

#172 – Bryan Caplan on why you should stop reading the news

Is following important political and international news a civic duty — or is it our civic duty to avoid it?It's common to think that 'staying informed' and checking the headlines every day is just wha...

17 Marras 20232h 23min

#171 – Alison Young on how top labs have jeopardised public health with repeated biosafety failures

#171 – Alison Young on how top labs have jeopardised public health with repeated biosafety failures

"Rare events can still cause catastrophic accidents. The concern that has been raised by experts going back over time, is that really, the more of these experiments, the more labs, the more opportunit...

9 Marras 20231h 46min

#170 – Santosh Harish on how air pollution is responsible for ~12% of global deaths — and how to get that number down

#170 – Santosh Harish on how air pollution is responsible for ~12% of global deaths — and how to get that number down

"One [outrageous example of air pollution] is municipal waste burning that happens in many cities in the Global South. Basically, this is waste that gets collected from people's homes, and instead of ...

1 Marras 20232h 57min

#169 – Paul Niehaus on whether cash transfers cause economic growth, and keeping theft to acceptable levels

#169 – Paul Niehaus on whether cash transfers cause economic growth, and keeping theft to acceptable levels

"One of our earliest supporters and a dear friend of mine, Mark Lampert, once said to me, “The way I think about it is, imagine that this money were already in the hands of people living in poverty. I...

26 Loka 20231h 47min

Suosittua kategoriassa Koulutus

rss-murhan-anatomia
psykopodiaa-podcast
voi-hyvin-meditaatiot-2
adhd-podi
rss-tietoinen-yhteys-podcast-2
rss-valo-minussa-2
psykologia
rss-liian-kuuma-peruna
rss-rahamania
kesken
rss-niinku-asia-on
rss-arkea-ja-aurinkoa-podcast-espanjasta
rahapuhetta
rss-uskonto-on-tylsaa
rss-vapaudu-voimaasi
jari-sarasvuo-podcast
ihminen-tavattavissa-tommy-hellsten-instituutti
rss-hereilla
aamukahvilla
dear-ladies