Cole Wyeth, PhD Student at the University of Waterloo, on Why We Should Wait to Build Superintelligent AI

In the AI Risk Reward podcast, our host, Alec Crawford (@alec06830), Founder and CEO of Artificial Intelligence Risk, Inc. aicrisk.com , interviews guests about balancing the risk and reward of Artificial Intelligence for you, your business, and society as a whole. Podcast production and sound engineering by Troutman Street Audio. You can find them on LinkedIn.

In this deep dive episode, Alec speaks with Cole Wyeth, PhD student at the University of Waterloo focused on AI safety and agent foundations, about why the long-term risk of superintelligent AI deserves far more attention today. Cole explains that aligning advanced systems with human values is extraordinarily difficult because ethics and preferences are hard to specify, and he argues that corrigibility, ambiguity awareness, and deference to humans are essential design goals. He also discusses how ideas like imprecise probability, embedded agency, and multi-agent dynamics can help researchers think more clearly about failure modes, reward hacking, and unexpected cooperation between AI systems. Throughout the conversation, Cole compares controlling superintelligence to cybersecurity, warning that a system smarter than its designers may find weaknesses in any safety scheme that looks secure on paper. The episode closes on a cautious note: until we understand how to reliably control self-improving AI, Cole believes society should slow down and wait years, or even decades, before creating superintelligent systems.

Summary:

Long-Term AI Risk: Cole Wyeth argues that superintelligent AI could become uncontrollable if developed before robust safety methods are in place.
Alignment Challenges: He explains that human ethics and values are too complex to formalize cleanly, making alignment an unusually hard technical problem.
Ambiguity and Deference: The discussion highlights the importance of building systems that recognize uncertainty and defer to humans in high-stakes situations.
Multi-Agent Failure Modes: Cole explores how AI systems may cooperate or behave strategically in unexpected ways, creating new safety and governance concerns.
Pause for Caution: His central takeaway is that society should delay building superintelligence until researchers better understand how to control it safely.

Referenced in this episode:

Companies/Organizations:

University of Waterloo
Verapath
Anthropic
OpenAI
DeepMind
Google
ARC
METR
Troutman Street Audio
Waters Technology

Det här avsnittet är hämtat från ett öppet RSS-flöde och publiceras inte av Podme. Det kan innehålla reklam.

Avsnitt(98)

Trust, Verify, Repeat: The Future of AI Governance with David Hardoon

In the AI: Trust but Verify podcast, our host, Alec Crawford (@alec06830), Founder and CEO of Verapath (https://www.verapath.com), interviews guests about how they are using AI in business, where you ...

7 Juli 36min

AI Guardrails > AI Models for Regulated Industries

23 Juni 35min

Dominick Romano: Watch Out for Foreign Influence in Our AI

In the AI: Trust but Verify podcast, our host, Alec Crawford (@alec06830), Founder and CEO of Verapath (www.verapath.com), interviews guests about how they are using AI in business, where you can trus...

9 Juni 41min

The AI Business Revolution Is Just Beginning, with Tim Sears, Ph.D.

2 Juni 38min

The AI Risk No One Sees Coming — with Kriste Krstovski of Columbia University

In the AI: Trust but Verify podcast, our host, Alec Crawford (@alec06830), Founder and CEO of Artificial Intelligence Risk, Inc. aicrisk.com , interviews guests about balancing the risk and reward of ...

26 Maj 59min