Generative Benchmarking with Kelly Hong - #728
In this episode, Kelly Hong, a researcher at Chroma, joins us to discuss "Generative Benchmarking," a novel approach to evaluating retrieval systems, like RAG applications, using synthetic data. Kelly explains how traditional benchmarks like MTEB fail to represent real-world query patterns and how embedding models that perform well on public benchmarks often underperform in production. The conversation explores the two-step process of Generative Benchmarking: filtering documents to focus on relevant content and generating queries that mimic actual user behavior. Kelly shares insights from applying this approach to Weights & Biases' technical support bot, revealing how domain-specific evaluation provides more accurate assessments of embedding model performance. We also discuss the importance of aligning LLM judges with human preferences, the impact of chunking strategies on retrieval effectiveness, and how production queries differ from benchmark queries in ambiguity and style. Throughout the episode, Kelly emphasizes the need for systematic evaluation approaches that go beyond "vibe checks" to help developers build more effective RAG applications. The complete show notes for this episode can be found at https://twimlai.com/go/728.

Avsnitt(779)

Taskonomy: Disentangling Transfer Learning for Perception (CVPR 2018 Best Paper Winner) with Amir Zamir - TWiML Talk #164

Taskonomy: Disentangling Transfer Learning for Perception (CVPR 2018 Best Paper Winner) with Amir Zamir - TWiML Talk #164

In this episode I'm joined by Amir Zamir, Postdoctoral researcher at both Stanford & UC Berkeley, who joins us fresh off of winning the 2018 CVPR Best Paper Award for co-authoring "Taskonomy: Disentan...

16 Juli 201847min

Predicting Metabolic Pathway Dynamics w/ Machine Learning with Zak Costello - TWiML Talk #163

Predicting Metabolic Pathway Dynamics w/ Machine Learning with Zak Costello - TWiML Talk #163

In today’s episode I’m joined by Zak Costello, post-doctoral fellow at the Joint BioEnergy Institute to discuss his recent paper, “A machine learning approach to predict metabolic pathway dynamics fro...

11 Juli 201839min

Machine Learning to Discover Physics and Engineering Principles with Nathan Kutz - TWiML Talk #162

Machine Learning to Discover Physics and Engineering Principles with Nathan Kutz - TWiML Talk #162

In this episode, I’m joined by Nathan Kutz, Professor of applied mathematics, electrical engineering and physics at the University of Washington to discuss his research into the use of machine learnin...

9 Juli 201843min

Automating Complex Internal Processes w/ AI with Alexander Chukovski - TWiML Talk #161

Automating Complex Internal Processes w/ AI with Alexander Chukovski - TWiML Talk #161

In this episode, I'm joined by Alexander Chukovski, Director of Data Services at Munich, Germany based career platform, Experteer. In our conversation, we explore Alex’s journey to implement machine l...

5 Juli 201839min

Designing Better Sequence Models with RNNs with Adji Bousso Dieng - TWiML Talk #160

Designing Better Sequence Models with RNNs with Adji Bousso Dieng - TWiML Talk #160

In this episode, I'm joined by Adji Bousso Dieng, PhD Student in the Department of Statistics at Columbia University to discuss two of her recent papers, “Noisin: Unbiased Regularization for Recurrent...

2 Juli 201838min

Love Love: AI and ML in Tennis with Stephanie Kovalchik - TWiML Talk #159

Love Love: AI and ML in Tennis with Stephanie Kovalchik - TWiML Talk #159

In the final show in our AI in Sports series, I’m joined by Stephanie Kovalchik, Research Fellow at Victoria University and Senior Sports Scientist at Tennis Australia. In our conversation we discuss...

29 Juni 201846min

Growth Hacking Sports w/ Machine Learning with Noah Gift - TWiML Talk #158

Growth Hacking Sports w/ Machine Learning with Noah Gift - TWiML Talk #158

In this episode of our AI in Sports series I'm joined by Noah Gift, Founder and Consulting CTO at Pragmatic Labs and professor at UC Davis. Noah and I discuss some of his recent work in using social m...

28 Juni 201850min

Fine-Grained Player Prediction in Sports with Jennifer Hobbs - TWiML Talk #157

Fine-Grained Player Prediction in Sports with Jennifer Hobbs - TWiML Talk #157

In this episode of our AI in Sports series, I'm joined by Jennifer Hobbs, Senior Data Scientist at STATS, a collector and distributor of sports data, to discuss the STATS data pipeline and how they co...

27 Juni 201842min

Populärt inom Politik & nyheter

aftonbladet-krim
motiv
p3-krim
rss-krimstad
fordomspodden
flashback-forever
rss-viva-fotboll
svenska-fall
rss-sanning-konsekvens
aftonbladet-daily
svd-dokumentara-berattelser-2
spar
rss-vad-fan-hande
rss-krimreportrarna
rss-frandfors-horna
krimmagasinet
olyckan-inifran
grans
rss-aftonbladet-krim
dagens-eko