Generative Benchmarking with Kelly Hong - #728
In this episode, Kelly Hong, a researcher at Chroma, joins us to discuss "Generative Benchmarking," a novel approach to evaluating retrieval systems, like RAG applications, using synthetic data. Kelly explains how traditional benchmarks like MTEB fail to represent real-world query patterns and how embedding models that perform well on public benchmarks often underperform in production. The conversation explores the two-step process of Generative Benchmarking: filtering documents to focus on relevant content and generating queries that mimic actual user behavior. Kelly shares insights from applying this approach to Weights & Biases' technical support bot, revealing how domain-specific evaluation provides more accurate assessments of embedding model performance. We also discuss the importance of aligning LLM judges with human preferences, the impact of chunking strategies on retrieval effectiveness, and how production queries differ from benchmark queries in ambiguity and style. Throughout the episode, Kelly emphasizes the need for systematic evaluation approaches that go beyond "vibe checks" to help developers build more effective RAG applications. The complete show notes for this episode can be found at https://twimlai.com/go/728.

Episoder(779)

Generating Ground-Level Images From Overhead Imagery Using GANs with Yi Zhu - TWiML Talk #172

Generating Ground-Level Images From Overhead Imagery Using GANs with Yi Zhu - TWiML Talk #172

Today we’re joined by Yi Zhu, a PhD candidate at UC Merced focused on geospatial image analysis. In our conversation, Yi and I take a look at his recent paper “What Is It Like Down There? Generating D...

13 Aug 201838min

Vision Systems for Planetary Landers and Drones with Larry Matthies - TWiML Talk #171

Vision Systems for Planetary Landers and Drones with Larry Matthies - TWiML Talk #171

Today we’re joined by Larry Matthies, Sr. Research Scientist and head of computer vision in the mobility and robotics division at JPL. In our conversation, we discuss two talks he gave at CVPR a few w...

9 Aug 201843min

Learning Semantically Meaningful and Actionable Representations with Ashutosh Saxena - TWiML Talk #170

Learning Semantically Meaningful and Actionable Representations with Ashutosh Saxena - TWiML Talk #170

In this episode i'm joined by Ashutosh Saxena, a veteran of Andrew Ng’s Stanford Machine Learning Group, and co-founder and CEO of Caspar.ai. Ashutosh and I discuss his RoboBrain project, a computatio...

6 Aug 201845min

AI Innovation for Clinical Decision Support with Joe Connor - TWiML Talk #169

AI Innovation for Clinical Decision Support with Joe Connor - TWiML Talk #169

In this episode I speak with Joe Connor, Founder of Experto Crede. In our conversation, we explore his experiences bringing AI powered healthcare projects to market in collaboration with the UK Natio...

2 Aug 201842min

Dynamic Visual Localization and Segmentation with Laura Leal-Taixé -TWiML Talk #168

Dynamic Visual Localization and Segmentation with Laura Leal-Taixé -TWiML Talk #168

In this episode I'm joined by Laura Leal-Taixé, Professor at the Technical University of Munich where she leads the Dynamic Vision and Learning Group. In our conversation, we discuss several of her r...

30 Jul 201844min

Conversational AI for the Intelligent Workplace with Gillian McCann - TWiML Talk #167

Conversational AI for the Intelligent Workplace with Gillian McCann - TWiML Talk #167

In this episode I'm joined by Gillian McCann, Head of Cloud Engineering and AI at Workgrid Software. In our conversation, which focuses on Workgrid’s use of cloud-based AI services, Gillian details so...

26 Jul 201836min

Computer Vision and Intelligent Agents for Wildlife Conservation with Jason Holmberg - TWiML Talk #166

Computer Vision and Intelligent Agents for Wildlife Conservation with Jason Holmberg - TWiML Talk #166

In this episode, I'm joined by Jason Holmberg, Executive Director and Director of Engineering at WildMe. Jason and I discuss Wildme's pair of open source computer vision based conservation projects, W...

22 Jul 201848min

Pragmatic Deep Learning for Medical Imagery with Prashant Warier - TWiML Talk #165

Pragmatic Deep Learning for Medical Imagery with Prashant Warier - TWiML Talk #165

In this episode I'm joined by Prashant Warier, CEO and Co-Founder of Qure.ai. We discuss the company’s work building products for interpreting head CT scans and chest x-rays. We look at knowledge gain...

19 Jul 201836min

Populært innen Politikk og nyheter

giver-og-gjengen-vg
aftenpodden
aftenpodden-usa
i-retten
stopp-verden
forklart
popradet
nokon-ma-ga
dine-penger-pengeradet
det-store-bildet
fotballpodden-2
rss-gukild-johaug
aftenbla-bla
hanna-de-heldige
rss-ness
bt-dokumentar-2
e24-podden
frokostshowet-pa-p5
rss-dannet-uten-piano
rss-penger-polser-og-politikk