Yoshua Bengio - Designing out Agency for Safe AI

Yoshua Bengio - Designing out Agency for Safe AI

Professor Yoshua Bengio is a pioneer in deep learning and Turing Award winner. Bengio talks about AI safety, why goal-seeking “agentic” AIs might be dangerous, and his vision for building powerful AI tools without giving them agency. Topics include reward tampering risks, instrumental convergence, global AI governance, and how non-agent AIs could revolutionize science and medicine while reducing existential threats. Perfect for anyone curious about advanced AI risks and how to manage them responsibly.


SPONSOR MESSAGES:

***

CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments.

https://centml.ai/pricing/


Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events?


They are hosting an event in Zurich on January 9th with the ARChitects, join if you can.


Goto https://tufalabs.ai/

***


Interviewer: Tim Scarfe


Yoshua Bengio:

https://x.com/Yoshua_Bengio

https://scholar.google.com/citations?user=kukA0LcAAAAJ&hl=en

https://yoshuabengio.org/

https://en.wikipedia.org/wiki/Yoshua_Bengio


TOC:

1. AI Safety Fundamentals

[00:00:00] 1.1 AI Safety Risks and International Cooperation

[00:03:20] 1.2 Fundamental Principles vs Scaling in AI Development

[00:11:25] 1.3 System 1/2 Thinking and AI Reasoning Capabilities

[00:15:15] 1.4 Reward Tampering and AI Agency Risks

[00:25:17] 1.5 Alignment Challenges and Instrumental Convergence


2. AI Architecture and Safety Design

[00:33:10] 2.1 Instrumental Goals and AI Safety Fundamentals

[00:35:02] 2.2 Separating Intelligence from Goals in AI Systems

[00:40:40] 2.3 Non-Agent AI as Scientific Tools

[00:44:25] 2.4 Oracle AI Systems and Mathematical Safety Frameworks


3. Global Governance and Security

[00:49:50] 3.1 International AI Competition and Hardware Governance

[00:51:58] 3.2 Military and Security Implications of AI Development

[00:56:07] 3.3 Personal Evolution of AI Safety Perspectives

[01:00:25] 3.4 AI Development Scaling and Global Governance Challenges

[01:12:10] 3.5 AI Regulation and Corporate Oversight


4. Technical Innovations

[01:23:00] 4.1 Evolution of Neural Architectures: From RNNs to Transformers

[01:26:02] 4.2 GFlowNets and Symbolic Computation

[01:30:47] 4.3 Neural Dynamics and Consciousness

[01:34:38] 4.4 AI Creativity and Scientific Discovery


SHOWNOTES (Transcript, references, best clips etc):

https://www.dropbox.com/scl/fi/ajucigli8n90fbxv9h94x/BENGIO_SHOW.pdf?rlkey=38hi2m19sylnr8orb76b85wkw&dl=0


CORE REFS (full list in shownotes and pinned comment):

[00:00:15] Bengio et al.: "AI Risk" Statement

https://www.safe.ai/work/statement-on-ai-risk


[00:23:10] Bengio on reward tampering & AI safety (Harvard Data Science Review)

https://hdsr.mitpress.mit.edu/pub/w974bwb0


[00:40:45] Munk Debate on AI existential risk, featuring Bengio

https://munkdebates.com/debates/artificial-intelligence


[00:44:30] "Can a Bayesian Oracle Prevent Harm from an Agent?" (Bengio et al.) on oracle-to-agent safety

https://arxiv.org/abs/2408.05284


[00:51:20] Bengio (2024) memo on hardware-based AI governance verification

https://yoshuabengio.org/wp-content/uploads/2024/08/FlexHEG-Memo_August-2024.pdf


[01:12:55] Bengio’s involvement in EU AI Act code of practice

https://digital-strategy.ec.europa.eu/en/news/meet-chairs-leading-development-first-general-purpose-ai-code-practice


[01:27:05] Complexity-based compositionality theory (Elmoznino, Jiralerspong, Bengio, Lajoie)

https://arxiv.org/abs/2410.14817


[01:29:00] GFlowNet Foundations (Bengio et al.) for probabilistic inference

https://arxiv.org/pdf/2111.09266


[01:32:10] Discrete attractor states in neural systems (Nam, Elmoznino, Bengio, Lajoie)

https://arxiv.org/pdf/2302.06403

Tämä jakso on lisätty Podme-palveluun avoimen RSS-syötteen kautta eikä se ole Podmen omaa tuotantoa. Siksi jakso saattaa sisältää mainontaa.

Jaksot(252)

When AI Decides You're a Threat — Brad Carson

When AI Decides You're a Threat — Brad Carson

Brad Carson was the Army's General Counsel, served two terms in Congress and was Acting Under Secretary of Defense for Personnel and Readiness. He now heads Americans for Responsible Innovation, the A...

31 Touko 1h 20min

Intelligence is collective, not artificial — Prof. Michael I. Jordan (UC Berkeley / Inria)

Intelligence is collective, not artificial — Prof. Michael I. Jordan (UC Berkeley / Inria)

Michael I. Jordan, described by Science magazine as the most influential computer scientist alive, has never thought of himself as an AI researcher. In this conversation he explains why that distincti...

21 Touko 1h 17min

 The AI Models Smart Enough to Know They're Cheating — Beth Barnes & David Rein [METR]

The AI Models Smart Enough to Know They're Cheating — Beth Barnes & David Rein [METR]

Beth Barnes and David Rein on the one graph that ate the AI timelines discourse, and why the two people who built it are the most careful about how you read it.**SPONSOR**Prolific - Quality data. From...

4 Touko 1h 53min

When AI Discovers The Next Transformer - Robert Lange (Sakana)

When AI Discovers The Next Transformer - Robert Lange (Sakana)

Robert Lange, founding researcher at Sakana AI, joins Tim to discuss *Shinka Evolve* — a framework that combines LLMs with evolutionary algorithms to do open-ended program search. The core claim: syst...

13 Maalis 1h 18min

"Vibe Coding is a Slot Machine" - Jeremy Howard

"Vibe Coding is a Slot Machine" - Jeremy Howard

Dive into the realities of AI-assisted coding, the origins of modern fine-tuning, and the cognitive science behind machine learning with fast.ai founder Jeremy Howard. In this episode, we unpack why A...

3 Maalis 1h 26min

 Evolution "Doesn't Need" Mutation - Blaise Agüera y Arcas

Evolution "Doesn't Need" Mutation - Blaise Agüera y Arcas

What if life itself is just a really sophisticated computer program that wrote itself into existence?Blaise Agüera y Arcas presenting at ALife 2025 — the most technically detailed public walkthrough o...

16 Helmi 55min

VAEs Are Energy-Based Models? [Dr. Jeff Beck]

VAEs Are Energy-Based Models? [Dr. Jeff Beck]

What makes something truly *intelligent?* Is a rock an agent? Could a perfect simulation of your brain actually *be* you? In this fascinating conversation, Dr. Jeff Beck takes us on a journey through ...

25 Tammi 46min

Abstraction & Idealization: AI's Plato Problem [Mazviita Chirimuuta]

Abstraction & Idealization: AI's Plato Problem [Mazviita Chirimuuta]

Professor Mazviita Chirimuuta joins us for a fascinating deep dive into the philosophy of neuroscience and what it really means to understand the mind.*What can neuroscience actually tell us about how...

23 Tammi 53min