#107 – Chris Olah on what the hell is going on inside neural networks

#107 – Chris Olah on what the hell is going on inside neural networks

Big machine learning models can identify plant species better than any human, write passable essays, beat you at a game of Starcraft 2, figure out how a photo of Tobey Maguire and the word 'spider' are related, solve the 60-year-old 'protein folding problem', diagnose some diseases, play romantic matchmaker, write solid computer code, and offer questionable legal advice.

Humanity made these amazing and ever-improving tools. So how do our creations work? In short: we don't know.

Today's guest, Chris Olah, finds this both absurd and unacceptable. Over the last ten years he has been a leader in the effort to unravel what's really going on inside these black boxes. As part of that effort he helped create the famous DeepDream visualisations at Google Brain, reverse engineered the CLIP image classifier at OpenAI, and is now continuing his work at Anthropic, a new $100 million research company that tries to "co-develop the latest safety techniques alongside scaling of large ML models".

Links to learn more, summary and full transcript.

Despite having a huge fan base thanks to his explanations of ML and tweets, today's episode is the first long interview Chris has ever given. It features his personal take on what we've learned so far about what ML algorithms are doing, and what's next for this research agenda at Anthropic.

His decade of work has borne substantial fruit, producing an approach for looking inside the mess of connections in a neural network and back out what functional role each piece is serving. Among other things, Chris and team found that every visual classifier seems to converge on a number of simple common elements in their early layers — elements so fundamental they may exist in our own visual cortex in some form.

They also found networks developing 'multimodal neurons' that would trigger in response to the presence of high-level concepts like 'romance', across both images and text, mimicking the famous 'Halle Berry neuron' from human neuroscience.

While reverse engineering how a mind works would make any top-ten list of the most valuable knowledge to pursue for its own sake, Chris's work is also of urgent practical importance. Machine learning models are already being deployed in medicine, business, the military, and the justice system, in ever more powerful roles. The competitive pressure to put them into action as soon as they can turn a profit is great, and only getting greater.

But if we don't know what these machines are doing, we can't be confident they'll continue to work the way we want as circumstances change. Before we hand an algorithm the proverbial nuclear codes, we should demand more assurance than "well, it's always worked fine so far".

But by peering inside neural networks and figuring out how to 'read their minds' we can potentially foresee future failures and prevent them before they happen. Artificial neural networks may even be a better way to study how our own minds work, given that, unlike a human brain, we can see everything that's happening inside them — and having been posed similar challenges, there's every reason to think evolution and 'gradient descent' often converge on similar solutions.

Among other things, Rob and Chris cover:

• Why Chris thinks it's necessary to work with the largest models
• What fundamental lessons we've learned about how neural networks (and perhaps humans) think
• How interpretability research might help make AI safer to deploy, and Chris’ response to skeptics
• Why there's such a fuss about 'scaling laws' and what they say about future AI progress

Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type 80,000 Hours into your podcasting app.

Producer: Keiran Harris
Audio mastering: Ben Cordell
Transcriptions: Sofia Davis-Fogel

Avsnitt(320)

Every AI Company's Safety Plan is 'Use AI to Make AI Safe'. Is That Crazy? | Ajeya Cotra

Every AI Company's Safety Plan is 'Use AI to Make AI Safe'. Is That Crazy? | Ajeya Cotra

Every major AI company has the same safety plan: when AI gets crazy powerful and really dangerous, they’ll use the AI itself to figure out how to make AI safe and beneficial. It sounds circular, almos...

17 Feb 2h 54min

What the hell happened with AGI timelines in 2025?

What the hell happened with AGI timelines in 2025?

In early 2025, after OpenAI put out the first-ever reasoning models — o1 and o3 — short timelines to transformative artificial general intelligence swept the AI world. But then, in the second half of ...

10 Feb 25min

#179 Classic episode – Randy Nesse on why evolution left us so vulnerable to depression and anxiety

#179 Classic episode – Randy Nesse on why evolution left us so vulnerable to depression and anxiety

Mental health problems like depression and anxiety affect enormous numbers of people and severely interfere with their lives. By contrast, we don’t see similar levels of physical ill health in young p...

3 Feb 2h 51min

#234 – David Duvenaud on why 'aligned AI' would still kill democracy

#234 – David Duvenaud on why 'aligned AI' would still kill democracy

Democracy might be a brief historical blip. That’s the unsettling thesis of a recent paper, which argues AI that can do all the work a human can do inevitably leads to the “gradual disempowerment” of ...

27 Jan 2h 31min

#145 Classic episode – Christopher Brown on why slavery abolition wasn't inevitable

#145 Classic episode – Christopher Brown on why slavery abolition wasn't inevitable

In many ways, humanity seems to have become more humane and inclusive over time. While there’s still a lot of progress to be made, campaigns to give people of different genders, races, sexualities, et...

20 Jan 2h 56min

#233 – James Smith on how to prevent a mirror life catastrophe

#233 – James Smith on how to prevent a mirror life catastrophe

When James Smith first heard about mirror bacteria, he was sceptical. But within two weeks, he’d dropped everything to work on it full time, considering it the worst biothreat that he’d seen described...

13 Jan 2h 9min

#144 Classic episode – Athena Aktipis on why cancer is a fundamental universal phenomena

#144 Classic episode – Athena Aktipis on why cancer is a fundamental universal phenomena

What’s the opposite of cancer? If you answered “cure,” “antidote,” or “antivenom” — you’ve obviously been reading the antonym section at www.merriam-webster.com/thesaurus/cancer.But today’s guest Athe...

9 Jan 3h 30min

#142 Classic episode – John McWhorter on why the optimal number of languages might be one, and other provocative claims about language

#142 Classic episode – John McWhorter on why the optimal number of languages might be one, and other provocative claims about language

John McWhorter is a linguistics professor at Columbia University specialising in research on creole languages. He's also a content-producing machine, never afraid to give his frank opinion on anything...

6 Jan 1h 35min

Populärt inom Utbildning

rss-bara-en-till-om-missbruk-medberoende-2
historiepodden-se
det-skaver
nu-blir-det-historia
harrisons-dramatiska-historia
rss-viktmedicinpodden
johannes-hansen-podcast
not-fanny-anymore
alska-oss
allt-du-velat-veta
roda-vita-rosen
rss-sjalsligt-avkladd
sektledare
sa-in-i-sjalen
i-vantan-pa-katastrofen
rss-beratta-alltid-det-har
rss-max-tant-med-max-villman
rss-basta-livet
dumforklarat
psykologsnack