Grokking, Generalization Collapse, and the Dynamics of Training Deep Neural Networks with Charles Martin - #734

Grokking, Generalization Collapse, and the Dynamics of Training Deep Neural Networks with Charles Martin - #734

Today, we're joined by Charles Martin, founder of Calculation Consulting, to discuss Weight Watcher, an open-source tool for analyzing and improving Deep Neural Networks (DNNs) based on principles from theoretical physics. We explore the foundations of the Heavy-Tailed Self-Regularization (HTSR) theory that underpins it, which combines random matrix theory and renormalization group ideas to uncover deep insights about model training dynamics. Charles walks us through WeightWatcher’s ability to detect three distinct learning phases—underfitting, grokking, and generalization collapse—and how its signature “layer quality” metric reveals whether individual layers are underfit, overfit, or optimally tuned. Additionally, we dig into the complexities involved in fine-tuning models, the surprising correlation between model optimality and hallucination, the often-underestimated challenges of search relevance, and their implications for RAG. Finally, Charles shares his insights into real-world applications of generative AI and his lessons learned from working in the field. The complete show notes for this episode can be found at https://twimlai.com/go/734.

Avsnitt(783)

Why Your RAG System Is Broken, and How to Fix It with Jason Liu - #709

Why Your RAG System Is Broken, and How to Fix It with Jason Liu - #709

Today, we're joined by Jason Liu, freelance AI consultant, advisor, and creator of the Instructor library to discuss all things retrieval-augmented generation (RAG). We dig into the tactical and strat...

11 Nov 202458min

An Agentic Mixture of Experts for DevOps with Sunil Mallya - #708

An Agentic Mixture of Experts for DevOps with Sunil Mallya - #708

Today we're joined by Sunil Mallya, CTO and co-founder of Flip AI. We discuss Flip’s incident debugging system for DevOps, which was built using a custom mixture of experts (MoE) large language model ...

4 Nov 20241h 15min

Building AI Voice Agents with Scott Stephenson - #707

Building AI Voice Agents with Scott Stephenson - #707

Today, we're joined by Scott Stephenson, co-founder and CEO of Deepgram to discuss voice AI agents. We explore the importance of perception, understanding, and interaction and how these key components...

28 Okt 20241h 1min

Is Artificial Superintelligence Imminent? with Tim Rocktäschel - #706

Is Artificial Superintelligence Imminent? with Tim Rocktäschel - #706

Today, we're joined by Tim Rocktäschel, senior staff research scientist at Google DeepMind, professor of Artificial Intelligence at University College London, and author of the recently published popu...

21 Okt 202455min

ML Models for Safety-Critical Systems with Lucas García - #705

ML Models for Safety-Critical Systems with Lucas García - #705

Today, we're joined by Lucas García, principal product manager for deep learning at MathWorks to discuss incorporating ML models into safety-critical systems. We begin by exploring the critical role o...

14 Okt 20241h 16min

AI Agents: Substance or Snake Oil with Arvind Narayanan - #704

AI Agents: Substance or Snake Oil with Arvind Narayanan - #704

Today, we're joined by Arvind Narayanan, professor of Computer Science at Princeton University to discuss his recent works, AI Agents That Matter and AI Snake Oil. In “AI Agents That Matter”, we explo...

7 Okt 202454min

AI Agents for Data Analysis with Shreya Shankar - #703

AI Agents for Data Analysis with Shreya Shankar - #703

Today, we're joined by Shreya Shankar, a PhD student at UC Berkeley to discuss DocETL, a declarative system for building and optimizing LLM-powered data processing pipelines for large-scale and comple...

30 Sep 202448min

Stealing Part of a Production Language Model with Nicholas Carlini - #702

Stealing Part of a Production Language Model with Nicholas Carlini - #702

Today, we're joined by Nicholas Carlini, research scientist at Google DeepMind to discuss adversarial machine learning and model security, focusing on his 2024 ICML best paper winner, “Stealing part o...

23 Sep 20241h 3min

Populärt inom Politik & nyheter

svenska-fall
aftonbladet-krim
p3-krim
rss-krimstad
flashback-forever
politiken
blenda-2
aftonbladet-daily
rss-sanning-konsekvens
spar
rss-vad-fan-hande
motiv
dagens-eko
grans
svd-ledarredaktionen
rss-krimreportrarna
olyckan-inifran
spotlight
rss-frandfors-horna
rss-aftonbladet-krim