Generative Benchmarking with Kelly Hong - #728
In this episode, Kelly Hong, a researcher at Chroma, joins us to discuss "Generative Benchmarking," a novel approach to evaluating retrieval systems, like RAG applications, using synthetic data. Kelly explains how traditional benchmarks like MTEB fail to represent real-world query patterns and how embedding models that perform well on public benchmarks often underperform in production. The conversation explores the two-step process of Generative Benchmarking: filtering documents to focus on relevant content and generating queries that mimic actual user behavior. Kelly shares insights from applying this approach to Weights & Biases' technical support bot, revealing how domain-specific evaluation provides more accurate assessments of embedding model performance. We also discuss the importance of aligning LLM judges with human preferences, the impact of chunking strategies on retrieval effectiveness, and how production queries differ from benchmark queries in ambiguity and style. Throughout the episode, Kelly emphasizes the need for systematic evaluation approaches that go beyond "vibe checks" to help developers build more effective RAG applications. The complete show notes for this episode can be found at https://twimlai.com/go/728.

Episoder(779)

Learning State Representations with Yael Niv - TWiML Talk #92

Learning State Representations with Yael Niv - TWiML Talk #92

This week on the podcast we’re featuring a series of conversations from the NIPs conference in Long Beach, California. I attended a bunch of talks and learned a ton, organized an impromptu roundtable ...

22 Des 201747min

Philosophy of Intelligence with Matthew Crosby - TWiML Talk #91

Philosophy of Intelligence with Matthew Crosby - TWiML Talk #91

This week on the podcast we’re featuring a series of conversations from the NIPs conference in Long Beach, California. I attended a bunch of talks and learned a ton, organized an impromptu roundtable ...

21 Des 201729min

Geometric Deep Learning with Joan Bruna & Michael Bronstein - TWiML Talk #90

Geometric Deep Learning with Joan Bruna & Michael Bronstein - TWiML Talk #90

This week on the podcast we’re featuring a series of conversations from the NIPs conference in Long Beach, California. I attended a bunch of talks and learned a ton, organized an impromptu roundtable ...

20 Des 201740min

AI at the NASA Frontier Development Lab with Sara Jennings, Timothy Seabrook and Andres Rodriguez

AI at the NASA Frontier Development Lab with Sara Jennings, Timothy Seabrook and Andres Rodriguez

This week on the podcast we’re featuring a series of conversations from the NIPs conference in Long Beach, California. I attended a bunch of talks and learned a ton, organized an impromptu roundtable ...

19 Des 201736min

Using Deep Learning and Google Street View to Estimate Demographics with Timnit Gebru

Using Deep Learning and Google Street View to Estimate Demographics with Timnit Gebru

This week on the podcast we’re featuring a series of conversations from the NIPs conference in Long Beach, California. I attended a bunch of talks and learned a ton, organized an impromptu roundtable ...

19 Des 201732min

Integrative Learning for Robotic Systems with Aaron Ames - TWiML Talk #87

Integrative Learning for Robotic Systems with Aaron Ames - TWiML Talk #87

This week on the podcast we’re featuring a series of conversations from the AWS re:Invent conference in Las Vegas. I had a great time at this event getting caught up on the latest and greatest machine...

15 Des 201747min

Visual Recognition in the Cloud for Law Enforcement with Chris Adzima - TWiML Talk #86

Visual Recognition in the Cloud for Law Enforcement with Chris Adzima - TWiML Talk #86

This week on the podcast we’re featuring a series of conversations from the AWS re:Invent conference in Las Vegas. I had a great time at this event getting caught up on the latest and greatest machine...

14 Des 201735min

Embodied Visual Learning with Kristen Grauman - TWiML Talk #85

Embodied Visual Learning with Kristen Grauman - TWiML Talk #85

This week on the podcast we’re featuring a series of conversations from the AWS re:Invent conference in Las Vegas. I had a great time at this event getting caught up on the latest and greatest machine...

13 Des 201739min

Populært innen Politikk og nyheter

giver-og-gjengen-vg
aftenpodden-usa
aftenpodden
i-retten
forklart
stopp-verden
popradet
fotballpodden-2
rss-gukild-johaug
nokon-ma-ga
det-store-bildet
dine-penger-pengeradet
bt-dokumentar-2
aftenbla-bla
hanna-de-heldige
rss-penger-polser-og-politikk
rss-dannet-uten-piano
frokostshowet-pa-p5
rss-ness
e24-podden