80,000 Hours Podcast31 Tammi 2025

#132 Classic episode – Nova DasSarma on why information security may be critical to the safe development of AI systems

If a business has spent $100 million developing a product, it’s a fair bet that they don’t want it stolen in two seconds and uploaded to the web where anyone can use it for free.

This problem exists in extreme form for AI companies. These days, the electricity and equipment required to train cutting-edge machine learning models that generate uncanny human text and images can cost tens or hundreds of millions of dollars. But once trained, such models may be only a few gigabytes in size and run just fine on ordinary laptops.

Today’s guest, the computer scientist and polymath Nova DasSarma, works on computer and information security for the AI company Anthropic with the security team. One of her jobs is to stop hackers exfiltrating Anthropic’s incredibly expensive intellectual property, as recently happened to Nvidia.

Rebroadcast: this episode was originally released in June 2022.

Links to learn more, highlights, and full transcript.

As she explains, given models’ small size, the need to store such models on internet-connected servers, and the poor state of computer security in general, this is a serious challenge.

The worries aren’t purely commercial though. This problem looms especially large for the growing number of people who expect that in coming decades we’ll develop so-called artificial ‘general’ intelligence systems that can learn and apply a wide range of skills all at once, and thereby have a transformative effect on society.

If aligned with the goals of their owners, such general AI models could operate like a team of super-skilled assistants, going out and doing whatever wonderful (or malicious) things are asked of them. This might represent a huge leap forward for humanity, though the transition to a very different new economy and power structure would have to be handled delicately.

If unaligned with the goals of their owners or humanity as a whole, such broadly capable models would naturally ‘go rogue,’ breaking their way into additional computer systems to grab more computing power — all the better to pursue their goals and make sure they can’t be shut off.

As Nova explains, in either case, we don’t want such models disseminated all over the world before we’ve confirmed they are deeply safe and law-abiding, and have figured out how to integrate them peacefully into society. In the first scenario, premature mass deployment would be risky and destabilising. In the second scenario, it could be catastrophic — perhaps even leading to human extinction if such general AI systems turn out to be able to self-improve rapidly rather than slowly, something we can only speculate on at this point.

If highly capable general AI systems are coming in the next 10 or 20 years, Nova may be flying below the radar with one of the most important jobs in the world.

We’ll soon need the ability to ‘sandbox’ (i.e. contain) models with a wide range of superhuman capabilities, including the ability to learn new skills, for a period of careful testing and limited deployment — preventing the model from breaking out, and criminals from breaking in. Nova and her colleagues are trying to figure out how to do this, but as this episode reveals, even the state of the art is nowhere near good enough.

Chapters:

Cold open (00:00:00)
Rob's intro (00:00:52)
The interview begins (00:02:44)
Why computer security matters for AI safety (00:07:39)
State of the art in information security (00:17:21)
The hack of Nvidia (00:26:50)
The most secure systems that exist (00:36:27)
Formal verification (00:48:03)
How organisations can protect against hacks (00:54:18)
Is ML making security better or worse? (00:58:11)
Motivated 14-year-old hackers (01:01:08)
Disincentivising actors from attacking in the first place (01:05:48)
Hofvarpnir Studios (01:12:40)
Capabilities vs safety (01:19:47)
Interesting design choices with big ML models (01:28:44)
Nova’s work and how she got into it (01:45:21)
Anthropic and career advice (02:05:52)
$600M Ethereum hack (02:18:37)
Personal computer security advice (02:23:06)
LastPass (02:31:04)
Stuxnet (02:38:07)
Rob's outro (02:40:18)

Producer: Keiran Harris
Audio mastering: Ben Cordell and Beppe Rådvik
Transcriptions: Katy Moore

Tämä jakso on lisätty Podme-palveluun avoimen RSS-syötteen kautta eikä se ole Podmen omaa tuotantoa. Siksi jakso saattaa sisältää mainontaa.

Jaksot(334)

#170 – Santosh Harish on how air pollution is responsible for ~12% of global deaths — and how to get that number down

"One [outrageous example of air pollution] is municipal waste burning that happens in many cities in the Global South. Basically, this is waste that gets collected from people's homes, and instead of ...

1 Marras 20232h 57min

#169 – Paul Niehaus on whether cash transfers cause economic growth, and keeping theft to acceptable levels

"One of our earliest supporters and a dear friend of mine, Mark Lampert, once said to me, “The way I think about it is, imagine that this money were already in the hands of people living in poverty. I...

26 Loka 20231h 47min

#168 – Ian Morris on whether deep history says we're heading for an intelligence explosion

"If we carry on looking at these industrialised economies, not thinking about what it is they're actually doing and what the potential of this is, you can make an argument that, yes, rates of growth a...

23 Loka 20232h 43min

#167 – Seren Kell on the research gaps holding back alternative proteins from mass adoption

"There have been literally thousands of years of breeding and living with animals to optimise these kinds of problems. But because we're just so early on with alternative proteins and there's so much ...

18 Loka 20231h 54min

#166 – Tantum Collins on what he’s learned as an AI policy insider at the White House, DeepMind and elsewhere

"If you and I and 100 other people were on the first ship that was going to go settle Mars, and were going to build a human civilisation, and we have to decide what that government looks like, and we ...

12 Loka 20233h 8min

#165 – Anders Sandberg on war in space, whether civilisations age, and the best things possible in our universe

"Now, the really interesting question is: How much is there an attacker-versus-defender advantage in this kind of advanced future? Right now, if somebody's sitting on Mars and you're going to war agai...

6 Loka 20232h 48min

#164 – Kevin Esvelt on cults that want to kill everyone, stealth vs wildfire pandemics, and how he felt inventing gene drives

"Imagine a fast-spreading respiratory HIV. It sweeps around the world. Almost nobody has symptoms. Nobody notices until years later, when the first people who are infected begin to succumb. They might...

2 Loka 20233h 3min

Great power conflict (Article)

Today’s release is a reading of our Great power conflict problem profile, written and narrated by Stephen Clare.If you want to check out the links, footnotes and figures in today’s article, you can fi...

22 Syys 20231h 19min