Generative Benchmarking with Kelly Hong - #728

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)23 Huhti 2025

Generative Benchmarking with Kelly Hong - #728

In this episode, Kelly Hong, a researcher at Chroma, joins us to discuss "Generative Benchmarking," a novel approach to evaluating retrieval systems, like RAG applications, using synthetic data. Kelly explains how traditional benchmarks like MTEB fail to represent real-world query patterns and how embedding models that perform well on public benchmarks often underperform in production. The conversation explores the two-step process of Generative Benchmarking: filtering documents to focus on relevant content and generating queries that mimic actual user behavior. Kelly shares insights from applying this approach to Weights & Biases' technical support bot, revealing how domain-specific evaluation provides more accurate assessments of embedding model performance. We also discuss the importance of aligning LLM judges with human preferences, the impact of chunking strategies on retrieval effectiveness, and how production queries differ from benchmark queries in ambiguity and style. Throughout the episode, Kelly emphasizes the need for systematic evaluation approaches that go beyond "vibe checks" to help developers build more effective RAG applications. The complete show notes for this episode can be found at https://twimlai.com/go/728.

Kokeile Premiumia

Nauti 14 päivää ilmaiseksi

Jaksot(779)

Real-Time Machine Learning in the Database with Nikita Shamgunov - TWiML Talk #84

Real-Time Machine Learning in the Database with Nikita Shamgunov - TWiML Talk #84

This week on the podcast we’re featuring a series of conversations from the AWS re:Invent conference in Las Vegas. I had a great time at this event getting caught up on the latest and greatest machine...

12 Joulu 201739min

re:Invent Roundup Roundtable - TWiML Talk # 83

re:Invent Roundup Roundtable - TWiML Talk # 83

This week on the podcast we’re featuring a series of conversations from the AWS re:Invent conference in Las Vegas. I had a great time at this event getting caught up on the latest and greatest machine...

11 Joulu 20171h 6min

Driving Customer Loyalty with Predictive and Conversational AI with Sherif Mityas - TWiML Talk #82

Driving Customer Loyalty with Predictive and Conversational AI with Sherif Mityas - TWiML Talk #82

This week on the podcast we’re running a series of shows consisting of conversations with some of the impressive speakers from an event called the AI Summit in New York City. The theme of the conferen...

8 Joulu 201736min

Innovation Factories for AI in FInancial Services with Thierry Derungs - TWiML Talk #81

Innovation Factories for AI in FInancial Services with Thierry Derungs - TWiML Talk #81

This week on the podcast we’re running a series of shows consisting of conversations with some of the impressive speakers from an event called the AI Summit in New York City. The theme of the conferen...

7 Joulu 201740min

Block-Sparse Kernels for Deep Neural Networks with Durk Kingma - TWiML Talk #80

Block-Sparse Kernels for Deep Neural Networks with Durk Kingma - TWiML Talk #80

The show is part of a series that I’m really excited about, in part because I’ve been working to bring them to you for quite a while now. The focus of the series is a sampling of the interesting work ...

7 Joulu 201744min

AI for Customer Service and Marketing at Aeromexico with Brian Gross - TWiML Talk #79

AI for Customer Service and Marketing at Aeromexico with Brian Gross - TWiML Talk #79

This week on the podcast we’re running a series of shows consisting of conversations with some of the impressive speakers from an event called the AI Summit in New York City. The theme of the conferen...

6 Joulu 201729min

Scaling AI for the Enterprise with Mazin Gilbert - TWiML Talk #78

Scaling AI for the Enterprise with Mazin Gilbert - TWiML Talk #78

This week on the podcast we’re running a series of shows consisting of conversations with some of the impressive speakers from an event called the AI Summit in New York City. The theme of the conferen...

5 Joulu 201749min

Scalable Distributed Deep Learning with Hillery Hunter - TWiML Talk #77

Scalable Distributed Deep Learning with Hillery Hunter - TWiML Talk #77

This week on the podcast we’re running a series of shows consisting of conversations with some of the impressive speakers from an event called the AI Summit in New York City. The theme of the conferen...

4 Joulu 201738min

Kaikki yhdessä sovelluksessa

Kuuntele kaikki suosikkipodcastisi ja -äänikirjasi yhdessä paikassa.

Sinulle valikoitua sisältöä

Podme-sovelluksessa kokoat suosikkisi helposti omaan kirjastoosi. Saat meiltä myös kuuntelusuosituksia!

Jatka kuuntelua koska tahansa

Voit jatkaa siitä mihin jäit, myös offline-tilassa.

Premium

9,99 €/kk

Kaikki premium-podcastit
Ei mainoksia
Ei sitoutumista, peruuta koska tahansa

Aloita 14 päivän kokeilu

Premium

13,99 €/kk

Kaikki premium-podcastit
Ei mainoksia
Ei sitoutumista, peruuta koska tahansa
Yksi lisäkäyttäjä

Kokeile 14 päivää maksutta

Suosittua kategoriassa Politiikka ja uutiset

rss-ootsa-kuullut-tasta

ootsa-kuullut-tasta-2

politiikan-puskaradio

et-sa-noin-voi-sanoo-esittaa

rss-podme-livebox

rss-vaalirankkurit-podcast

the-ulkopolitist

rss-kaikki-uusiksi

io-techin-tekniikkapodcast

rss-mina-ukkola

rss-fingo-podcast

rss-hyvaa-huomenta-bryssel

rss-merja-mahkan-rahat

Tarinat ja äänet, joita rakastat kuunnella

Kuuntele kaikki suosikkipodcastisi ja -äänikirjasi