Grokking, Generalization Collapse, and the Dynamics of Training Deep Neural Networks with Charles Martin - #734

Grokking, Generalization Collapse, and the Dynamics of Training Deep Neural Networks with Charles Martin - #734

Today, we're joined by Charles Martin, founder of Calculation Consulting, to discuss Weight Watcher, an open-source tool for analyzing and improving Deep Neural Networks (DNNs) based on principles from theoretical physics. We explore the foundations of the Heavy-Tailed Self-Regularization (HTSR) theory that underpins it, which combines random matrix theory and renormalization group ideas to uncover deep insights about model training dynamics. Charles walks us through WeightWatcher’s ability to detect three distinct learning phases—underfitting, grokking, and generalization collapse—and how its signature “layer quality” metric reveals whether individual layers are underfit, overfit, or optimally tuned. Additionally, we dig into the complexities involved in fine-tuning models, the surprising correlation between model optimality and hallucination, the often-underestimated challenges of search relevance, and their implications for RAG. Finally, Charles shares his insights into real-world applications of generative AI and his lessons learned from working in the field. The complete show notes for this episode can be found at https://twimlai.com/go/734.

Tämä jakso on lisätty Podme-palveluun avoimen RSS-syötteen kautta eikä se ole Podmen omaa tuotantoa. Siksi jakso saattaa sisältää mainontaa.

Jaksot(786)

Relational Foundation Models for Enterprise Data with Jure Leskovec - #768

Relational Foundation Models for Enterprise Data with Jure Leskovec - #768

In this episode, Jure Leskovec, co-founder and chief scientist at Kumo and professor of computer science at Stanford, joins us to explore two fronts of his work: AI for science and relational deep lea...

21 Touko 1h 6min

How to Find the Agent Failures Your Evals Miss with Scott Clark - #767

How to Find the Agent Failures Your Evals Miss with Scott Clark - #767

In this episode, Scott Clark, co-founder and CEO of Distributional, joins us to explore how teams can reliably operate and improve complex LLM systems and agents in production. Scott introduces a Masl...

7 Touko 53min

How to Engineer AI Inference Systems with Philip Kiely - #766

How to Engineer AI Inference Systems with Philip Kiely - #766

In this episode, Philip Kiely, head of AI education at Baseten, joins us to unpack the fast-evolving discipline of inference engineering. We explore why inference has become the stickiest and most cri...

30 Huhti 54min

How Capital One Delivers Multi-Agent Systems with Rashmi Shetty - #765

How Capital One Delivers Multi-Agent Systems with Rashmi Shetty - #765

In this episode, Rashmi Shetty, senior director of enterprise generative AI platform at Capital One, joins us to explore how the company is designing, deploying, and scaling multi-agent systems in a h...

16 Huhti 54min

The Race to Production-Grade Diffusion LLMs with Stefano Ermon - #764

The Race to Production-Grade Diffusion LLMs with Stefano Ermon - #764

Today, we're joined by Stefano Ermon, associate professor at Stanford University and CEO of Inception Labs to discuss diffusion language models. We dig into how diffusion approaches—traditionally used...

26 Maalis 1h 3min

Agent Swarms and Knowledge Graphs for Autonomous Software Development with Siddhant Pardeshi - #763

Agent Swarms and Knowledge Graphs for Autonomous Software Development with Siddhant Pardeshi - #763

In this episode, Sid Pardeshi, co-founder and CTO of Blitzy, joins us to discuss building autonomous development systems able to deliver production-ready software at enterprise scale. Sid contrasts AI...

10 Maalis 1h 16min

AI Trends 2026: OpenClaw Agents, Reasoning LLMs, and More with Sebastian Raschka - #762

AI Trends 2026: OpenClaw Agents, Reasoning LLMs, and More with Sebastian Raschka - #762

In this episode, Sebastian Raschka, independent LLM researcher and author, joins us to break down how the LLM landscape has changed over the past year and what is likely to matter most in 2026. We dis...

26 Helmi 1h 18min

The Evolution of Reasoning in Small Language Models with Yejin Choi - #761

The Evolution of Reasoning in Small Language Models with Yejin Choi - #761

Today, we're joined by Yejin Choi, professor and senior fellow at Stanford University in the Computer Science Department and the Institute for Human-Centered AI (HAI). In this conversation, we explore...

29 Tammi 1h 6min

Suosittua kategoriassa Politiikka ja uutiset

uutiscast
aikalisa
politiikan-puskaradio
rss-ootsa-kuullut-tasta
ootsa-kuullut-tasta-2
rss-vaalirankkurit-podcast
otetaan-yhdet
rss-podme-livebox
rss-asiastudio
the-ulkopolitist
tervo-halme
et-sa-noin-voi-sanoo-esittaa
rss-polikulaari-pitka-kiekko-ja-muut-ts-podcastit
rss-girls-finish-f1rst
rss-ulkopoditiikkaa
rss-pinnalla
linda-maria
viisupodi
rss-kaikki-uusiksi
rss-vain-talouselamaa