80,000 Hours Podcast22 Aug 2024

#197 – Nick Joseph on whether Anthropic's AI safety policy is up to the task

The three biggest AI companies — Anthropic, OpenAI, and DeepMind — have now all released policies designed to make their AI models less likely to go rogue or cause catastrophic damage as they approach, and eventually exceed, human capabilities. Are they good enough?

That’s what host Rob Wiblin tries to hash out in this interview (recorded May 30) with Nick Joseph — one of the original cofounders of Anthropic, its current head of training, and a big fan of Anthropic’s “responsible scaling policy” (or “RSP”). Anthropic is the most safety focused of the AI companies, known for a culture that treats the risks of its work as deadly serious.

Links to learn more, highlights, video, and full transcript.

As Nick explains, these scaling policies commit companies to dig into what new dangerous things a model can do — after it’s trained, but before it’s in wide use. The companies then promise to put in place safeguards they think are sufficient to tackle those capabilities before availability is extended further. For instance, if a model could significantly help design a deadly bioweapon, then its weights need to be properly secured so they can’t be stolen by terrorists interested in using it that way.

As capabilities grow further — for example, if testing shows that a model could exfiltrate itself and spread autonomously in the wild — then new measures would need to be put in place to make that impossible, or demonstrate that such a goal can never arise.

Nick points out what he sees as the biggest virtues of the RSP approach, and then Rob pushes him on some of the best objections he’s found to RSPs being up to the task of keeping AI safe and beneficial. The two also discuss whether it's essential to eventually hand over operation of responsible scaling policies to external auditors or regulatory bodies, if those policies are going to be able to hold up against the intense commercial pressures that might end up arrayed against them.

In addition to all of that, Nick and Rob talk about:

What Nick thinks are the current bottlenecks in AI progress: people and time (rather than data or compute).
What it’s like working in AI safety research at the leading edge, and whether pushing forward capabilities (even in the name of safety) is a good idea.
What it’s like working at Anthropic, and how to get the skills needed to help with the safe development of AI.

And as a reminder, if you want to let us know your reaction to this interview, or send any other feedback, our inbox is always open at podcast@80000hours.org.

Chapters:

Cold open (00:00:00)
Rob’s intro (00:01:00)
The interview begins (00:03:44)
Scaling laws (00:04:12)
Bottlenecks to further progress in making AIs helpful (00:08:36)
Anthropic’s responsible scaling policies (00:14:21)
Pros and cons of the RSP approach for AI safety (00:34:09)
Alternatives to RSPs (00:46:44)
Is an internal audit really the best approach? (00:51:56)
Making promises about things that are currently technically impossible (01:07:54)
Nick’s biggest reservations about the RSP approach (01:16:05)
Communicating “acceptable” risk (01:19:27)
Should Anthropic’s RSP have wider safety buffers? (01:26:13)
Other impacts on society and future work on RSPs (01:34:01)
Working at Anthropic (01:36:28)
Engineering vs research (01:41:04)
AI safety roles at Anthropic (01:48:31)
Should concerned people be willing to take capabilities roles? (01:58:20)
Recent safety work at Anthropic (02:10:05)
Anthropic culture (02:14:35)
Overrated and underrated AI applications (02:22:06)
Rob’s outro (02:26:36)

Producer and editor: Keiran Harris
Audio engineering by Ben Cordell, Milo McGuire, Simon Monsour, and Dominic Armstrong
Video engineering: Simon Monsour
Transcriptions: Katy Moore

Oppdag Premium

Prøv 14 dager gratis

Kjøp Premium

Episoder(325)

#124 – Karen Levy on fads and misaligned incentives in global development, and scaling deworming to reach hundreds of millions

If someone said a global health and development programme was sustainable, participatory, and holistic, you'd have to guess that they were saying something positive. But according to today's guest Kar...

21 Mar 20223h 9min

#123 – Samuel Charap on why Putin invaded Ukraine, the risk of escalation, and how to prevent disaster

Russia's invasion of Ukraine is devastating the lives of Ukrainians, and so long as it continues there's a risk that the conflict could escalate to include other countries or the use of nuclear weapon...

14 Mar 202259min

#122 – Michelle Hutchinson & Habiba Islam on balancing competing priorities and other themes from our 1-on-1 careers advising

One of 80,000 Hours' main services is our free one-on-one careers advising, which we provide to around 1,000 people a year. Today we speak to two of our advisors, who have each spoken to hundreds of p...

9 Mar 20221h 36min

Introducing 80k After Hours

Today we're launching a new podcast called 80k After Hours. Like this show it’ll mostly still explore the best ways to do good — and some episodes will be even more laser-focused on careers than mos...

1 Mar 202213min

#121 – Matthew Yglesias on avoiding the pundit's fallacy and how much military intervention can be used for good

If you read polls saying that the public supports a carbon tax, should you believe them? According to today's guest — journalist and blogger Matthew Yglesias — it's complicated, but probably not. Link...

16 Feb 20223h 4min

#120 – Audrey Tang on what we can learn from Taiwan’s experiments with how to do democracy

In 2014 Taiwan was rocked by mass protests against a proposed trade agreement with China that was about to be agreed without the usual Parliamentary hearings. Students invaded and took over the Parlia...

2 Feb 20222h 5min

#43 Classic episode - Daniel Ellsberg on the institutional insanity that maintains nuclear doomsday machines

Rebroadcast: this episode was originally released in September 2018.In Stanley Kubrick’s iconic film Dr. Strangelove, the American president is informed that the Soviet Union has created a secret dete...

18 Jan 20222h 35min

#35 Classic episode - Tara Mac Aulay on the audacity to fix the world without asking permission

Rebroadcast: this episode was originally released in June 2018. How broken is the world? How inefficient is a typical organisation? Looking at Tara Mac Aulay’s life, the answer seems to be ‘very’. A...

10 Jan 20221h 23min

Reklamefrie Premium-podkaster

Hør populære podkaster som Storefri med Mikkel og Herman, Ida med hjertet i hånden, Krimpodden og mye mye mer

Skap din egen podkastboble

I appen skaper du ditt eget bibliotek med favoritter, og vi gir deg også anbefalinger til podkaster du ikke kan gå glipp av.

Prøv 14 dager gratis

Dersom du er ny Podme-bruker får du 14 dager gratis prøveperiode når du oppretter abonnement

Premium

99 kr/ måned

Tilgang til alle våre Premium-podkaster
Alle podkaster fra VG, Aftenposten, BT og SA
Reklamefritt Premium-innhold
Ingen bindingstid. Avslutt når du ønsker

Prøv 14 dager gratis

Premium

129 kr/ måned

Tilgang til alle Premium-podkaster
Alle podkaster fra VG, Aftenposten, BT og SA
Reklamefritt Premium-innhold
Ingen bindingstid. Avslutt når du ønsker
En Ekstra bruker

Prøv 14 dager gratis

Populært innen Fakta

relasjonspodden-med-dora-thorhallsdottir-kjersti-idem

Historiene og stemmene du vil høre

Ubegrenset tilgang til alle dine favorittpodkaster og lydbøker

Les mer