Grokking, Generalization Collapse, and the Dynamics of Training Deep Neural Networks with Charles Martin - #734

Grokking, Generalization Collapse, and the Dynamics of Training Deep Neural Networks with Charles Martin - #734

Today, we're joined by Charles Martin, founder of Calculation Consulting, to discuss Weight Watcher, an open-source tool for analyzing and improving Deep Neural Networks (DNNs) based on principles from theoretical physics. We explore the foundations of the Heavy-Tailed Self-Regularization (HTSR) theory that underpins it, which combines random matrix theory and renormalization group ideas to uncover deep insights about model training dynamics. Charles walks us through WeightWatcher’s ability to detect three distinct learning phases—underfitting, grokking, and generalization collapse—and how its signature “layer quality” metric reveals whether individual layers are underfit, overfit, or optimally tuned. Additionally, we dig into the complexities involved in fine-tuning models, the surprising correlation between model optimality and hallucination, the often-underestimated challenges of search relevance, and their implications for RAG. Finally, Charles shares his insights into real-world applications of generative AI and his lessons learned from working in the field. The complete show notes for this episode can be found at https://twimlai.com/go/734.

Episoder(782)

Automated Design of Agentic Systems with Shengran Hu - #700

Automated Design of Agentic Systems with Shengran Hu - #700

Today, we're joined by Shengran Hu, a PhD student at the University of British Columbia, to discuss Automated Design of Agentic Systems (ADAS), an approach focused on automatically creating agentic sy...

2 Sep 202459min

The EU AI Act and Mitigating Bias in Automated Decisioning with Peter van der Putten - #699

The EU AI Act and Mitigating Bias in Automated Decisioning with Peter van der Putten - #699

Today, we're joined by Peter van der Putten, director of the AI Lab at Pega and assistant professor of AI at Leiden University. We discuss the newly adopted European AI Act and the challenges of apply...

27 Aug 202445min

The Building Blocks of Agentic Systems with Harrison Chase - #698

The Building Blocks of Agentic Systems with Harrison Chase - #698

Today, we're joined by Harrison Chase, co-founder and CEO of LangChain to discuss LLM frameworks, agentic systems, RAG, evaluation, and more. We dig into the elements of a modern LLM framework, includ...

19 Aug 202459min

Simplifying On-Device AI for Developers with Siddhika Nevrekar - #697

Simplifying On-Device AI for Developers with Siddhika Nevrekar - #697

Today, we're joined by Siddhika Nevrekar, AI Hub head at Qualcomm Technologies, to discuss on-device AI and how to make it easier for developers to take advantage of device capabilities. We unpack the...

12 Aug 202446min

Genie: Generative Interactive Environments with Ashley Edwards - #696

Genie: Generative Interactive Environments with Ashley Edwards - #696

Today, we're joined by Ashley Edwards, a member of technical staff at Runway, to discuss Genie: Generative Interactive Environments, a system for creating ‘playable’ video environments for training de...

5 Aug 202446min

Bridging the Sim2real Gap in Robotics with Marius Memmel - #695

Bridging the Sim2real Gap in Robotics with Marius Memmel - #695

Today, we're joined by Marius Memmel, a PhD student at the University of Washington, to discuss his research on sim-to-real transfer approaches for developing autonomous robotic agents in unstructured...

30 Jul 202457min

Building Real-World LLM Products with Fine-Tuning and More with Hamel Husain - #694

Building Real-World LLM Products with Fine-Tuning and More with Hamel Husain - #694

Today, we're joined by Hamel Husain, founder of Parlance Labs, to discuss the ins and outs of building real-world products using large language models (LLMs). We kick things off discussing novel appli...

23 Jul 20241h 20min

Mamba, Mamba-2 and Post-Transformer Architectures for Generative AI with Albert Gu - #693

Mamba, Mamba-2 and Post-Transformer Architectures for Generative AI with Albert Gu - #693

Today, we're joined by Albert Gu, assistant professor at Carnegie Mellon University, to discuss his research on post-transformer architectures for multi-modal foundation models, with a focus on state-...

17 Jul 202457min

Populært innen Politikk og nyheter

giver-og-gjengen-vg
aftenpodden
aftenpodden-usa
forklart
lydartikler-fra-aftenposten
popradet
fotballpodden-2
stopp-verden
dine-penger-pengeradet
det-store-bildet
rss-gukild-johaug
nokon-ma-ga
hanna-de-heldige
rss-ness
e24-podden
aftenbla-bla
i-retten
frokostshowet-pa-p5
rss-dannet-uten-piano
grasoner-den-nye-kalde-krigen