80,000 Hours Podcast1 Aug 2024

#195 – Sella Nevo on who's trying to steal frontier AI models, and what they could do with them

"Computational systems have literally millions of physical and conceptual components, and around 98% of them are embedded into your infrastructure without you ever having heard of them. And an inordinate amount of them can lead to a catastrophic failure of your security assumptions. And because of this, the Iranian secret nuclear programme failed to prevent a breach, most US agencies failed to prevent multiple breaches, most US national security agencies failed to prevent breaches. So ensuring your system is truly secure against highly resourced and dedicated attackers is really, really hard." —Sella Nevo

In today’s episode, host Luisa Rodriguez speaks to Sella Nevo — director of the Meselson Center at RAND — about his team’s latest report on how to protect the model weights of frontier AI models from actors who might want to steal them.

Links to learn more, highlights, and full transcript.

They cover:

Real-world examples of sophisticated security breaches, and what we can learn from them.
Why AI model weights might be such a high-value target for adversaries like hackers, rogue states, and other bad actors.
The many ways that model weights could be stolen, from using human insiders to sophisticated supply chain hacks.
The current best practices in cybersecurity, and why they may not be enough to keep bad actors away.
New security measures that Sella hopes can mitigate with the growing risks.
Sella’s work using machine learning for flood forecasting, which has significantly reduced injuries and costs from floods across Africa and Asia.
And plenty more.

Also, RAND is currently hiring for roles in technical and policy information security — check them out if you're interested in this field!

Chapters:

Cold open (00:00:00)
Luisa’s intro (00:00:56)
The interview begins (00:02:30)
The importance of securing the model weights of frontier AI models (00:03:01)
The most sophisticated and surprising security breaches (00:10:22)
AI models being leaked (00:25:52)
Researching for the RAND report (00:30:11)
Who tries to steal model weights? (00:32:21)
Malicious code and exploiting zero-days (00:42:06)
Human insiders (00:53:20)
Side-channel attacks (01:04:11)
Getting access to air-gapped networks (01:10:52)
Model extraction (01:19:47)
Reducing and hardening authorised access (01:38:52)
Confidential computing (01:48:05)
Red-teaming and security testing (01:53:42)
Careers in information security (01:59:54)
Sella’s work on flood forecasting systems (02:01:57)
Luisa’s outro (02:04:51)

Producer and editor: Keiran Harris
Audio engineering team: Ben Cordell, Simon Monsour, Milo McGuire, and Dominic Armstrong
Additional content editing: Katy Moore and Luisa Rodriguez
Transcriptions: Katy Moore

Denne episoden er hentet fra en åpen RSS-feed og er ikke publisert av Podme. Den kan derfor inneholde annonser.

Episoder(340)

We can guess what intergalactic war would look like. And strangely, it matters.

Intergalactic war is probably billions of years away — yet physics can already tell us how it ends. And strangely that conclusion is relevant to decisions people have to make today.In this video, Rob ...

18 Jun 15min

How AI could create the world’s biggest problems (article by Zershaaneh Qureshi)

Imagine you’re living 15,000 years ago. Your people are hunter-gatherers and you sleep under the stars. If someone told you humans would one day build cities with millions of people, fly through the a...

11 Jun 1h 29min

What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

Most people working on AI safety think without a massive effort AI systems will probably end up with goals catastrophically different from humanity’s. Today’s guest, Rohin Shah — head of AGI Safety an...

2 Jun 2h 48min

What makes for a dream job? | Benjamin Todd

What actually makes a job fulfilling? It's not what most career advice tells you. "Follow your passion" sounds inspiring, but it's misleading — and the research backs that up.Drawing on hundreds of st...

28 Mai 28min

We’re updating our career advice for the strangest time in history | Benjamin Todd, author of 80,000 Hours

The average career is 80,000 hours long. With AI advancing so rapidly, the hours you have left in your career matter more than ever.Some leading AI researchers think there’s a 10% chance that AI syste...

26 Mai 1h 6min

Can AIs already start 'rogue deployments' inside AI companies? (Landmark new METR report)

A red-teamer was embedded inside Anthropic for three weeks, told to imagine he was an evil Claude, and asked to figure out how to launch a ‘rogue AI deployment’ without getting caught. It’s one part o...

20 Mai 20min

#243 – 'Godfather of AI' Yoshua Bengio: "I now see a path" to safe superintelligent AI

The co-inventor of modern AI and the most cited living scientist believes he's figured out how to ensure AI is honest, incapable of deception, and never goes rogue. Yoshua Bengio – Turing Award Winner...

7 Mai 2h 35min

'95% of AI Pilots Fail': The hidden agenda behind the viral stat that misled millions

You might have heard that '95% of corporate AI pilots' are failing. It was one of the most widely cited AI statistics of 2025, parroted by media outlets everywhere. It helped trigger a Nasdaq selloff ...

28 Apr 10min