
Generative Benchmarking with Kelly Hong - #728
In this episode, Kelly Hong, a researcher at Chroma, joins us to discuss "Generative Benchmarking," a novel approach to evaluating retrieval systems, like RAG applications, using synthetic data. Kelly...
23 Huhti 202554min

Exploring the Biology of LLMs with Circuit Tracing with Emmanuel Ameisen - #727
In this episode, Emmanuel Ameisen, a research engineer at Anthropic, returns to discuss two recent papers: "Circuit Tracing: Revealing Language Model Computational Graphs" and "On the Biology of a Lar...
14 Huhti 20251h 34min

Teaching LLMs to Self-Reflect with Reinforcement Learning with Maohao Shen - #726
Today, we're joined by Maohao Shen, PhD student at MIT to discuss his paper, “Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search.” We dig into...
8 Huhti 202551min

Waymo's Foundation Model for Autonomous Driving with Drago Anguelov - #725
Today, we're joined by Drago Anguelov, head of AI foundations at Waymo, for a deep dive into the role of foundation models in autonomous driving. Drago shares how Waymo is leveraging large-scale machi...
31 Maalis 20251h 9min

Dynamic Token Merging for Efficient Byte-level Language Models with Julie Kallini - #724
Today, we're joined by Julie Kallini, PhD student at Stanford University to discuss her recent papers, “MrT5: Dynamic Token Merging for Efficient Byte-level Language Models” and “Mission: Impossible L...
24 Maalis 202550min

Scaling Up Test-Time Compute with Latent Reasoning with Jonas Geiping - #723
Today, we're joined by Jonas Geiping, research group leader at Ellis Institute and the Max Planck Institute for Intelligent Systems to discuss his recent paper, “Scaling up Test-Time Compute with Late...
17 Maalis 202558min

Imagine while Reasoning in Space: Multimodal Visualization-of-Thought with Chengzu Li - #722
Today, we're joined by Chengzu Li, PhD student at the University of Cambridge to discuss his recent paper, “Imagine while Reasoning in Space: Multimodal Visualization-of-Thought.” We explore the motiv...
10 Maalis 202542min

Inside s1: An o1-Style Reasoning Model That Cost Under $50 to Train with Niklas Muennighoff - #721
Today, we're joined by Niklas Muennighoff, a PhD student at Stanford University, to discuss his paper, “S1: Simple Test-Time Scaling.” We explore the motivations behind S1, as well as how it compares ...
3 Maalis 202549min






















