Generative Benchmarking with Kelly Hong - #728

Generative Benchmarking with Kelly Hong - #728

In this episode, Kelly Hong, a researcher at Chroma, joins us to discuss "Generative Benchmarking," a novel approach to evaluating retrieval systems, like RAG applications, using synthetic data. Kelly explains how traditional benchmarks like MTEB fail to represent real-world query patterns and how embedding models that perform well on public benchmarks often underperform in production. The conversation explores the two-step process of Generative Benchmarking: filtering documents to focus on relevant content and generating queries that mimic actual user behavior. Kelly shares insights from applying this approach to Weights & Biases' technical support bot, revealing how domain-specific evaluation provides more accurate assessments of embedding model performance. We also discuss the importance of aligning LLM judges with human preferences, the impact of chunking strategies on retrieval effectiveness, and how production queries differ from benchmark queries in ambiguity and style. Throughout the episode, Kelly emphasizes the need for systematic evaluation approaches that go beyond "vibe checks" to help developers build more effective RAG applications. The complete show notes for this episode can be found at https://twimlai.com/go/728.

Jaksot(779)

Checking in with the Master w/ Garry Kasparov - TWiML Talk #140

Checking in with the Master w/ Garry Kasparov - TWiML Talk #140

In this episode I’m joined by legendary chess champion, author, and fellow at the Oxford Martin School, Garry Kasparov. Garry and I sat down after his keynote at the Figure Eight Train AI conference i...

21 Touko 201832min

Exploring AI-Generated Music with Taryn Southern - TWiML Talk #139

Exploring AI-Generated Music with Taryn Southern - TWiML Talk #139

In this episode I’m joined by Taryn Southern - a singer, digital storyteller and Youtuber, whose upcoming album I AM AI will be produced completely with AI based tools. Taryn and I explore all aspects...

17 Touko 201833min

Practical Deep Learning with Rachel Thomas - TWiML Talk #138

Practical Deep Learning with Rachel Thomas - TWiML Talk #138

In this episode, i'm joined by Rachel Thomas, founder and researcher at Fast AI. If you’re not familiar with Fast AI, the company offers a series of courses including Practical Deep Learning for Coder...

14 Touko 201844min

Kinds of Intelligence w/ Jose Hernandez-Orallo - TWiML Talk #137

Kinds of Intelligence w/ Jose Hernandez-Orallo - TWiML Talk #137

In this episode, I'm joined by Jose Hernandez-Orallo, professor in the department of information systems and computing at Universitat Politècnica de València and fellow at the Leverhulme Centre for th...

10 Touko 201844min

Taming arXiv with Natural Language Processing w/ John Bohannon - TWiML Talk #136

Taming arXiv with Natural Language Processing w/ John Bohannon - TWiML Talk #136

In this episode i'm joined by John Bohannan, Director of Science at AI startup Primer. As you all may know, a few weeks ago we released my interview with Google legend Jeff Dean, which, by the way, yo...

7 Touko 201854min

Epsilon Software for Private Machine Learning with Chang Liu - TWiML Talk #135

Epsilon Software for Private Machine Learning with Chang Liu - TWiML Talk #135

In this episode, our final episode in the Differential Privacy series, I speak with Chang Liu, applied research scientist at Georgian Partners, a venture capital firm that invests in growth stage busi...

4 Touko 201846min

Scalable Differential Privacy for Deep Learning with Nicolas Papernot - TWiML Talk #134

Scalable Differential Privacy for Deep Learning with Nicolas Papernot - TWiML Talk #134

In this episode of our Differential Privacy series, I'm joined by Nicolas Papernot, Google PhD Fellow in Security and graduate student in the department of computer science at Penn State University. N...

3 Touko 201859min

Differential Privacy at Bluecore with Zahi Karam - TWiML Talk #133

Differential Privacy at Bluecore with Zahi Karam - TWiML Talk #133

In this episode of our Differential Privacy series, I'm joined by Zahi Karam, Director of Data Science at Bluecore, whose retail marketing platform specializes in personalized email marketing. I sat d...

1 Touko 201838min

Suosittua kategoriassa Politiikka ja uutiset

aikalisa
rss-ootsa-kuullut-tasta
tervo-halme
ootsa-kuullut-tasta-2
politiikan-puskaradio
viisupodi
et-sa-noin-voi-sanoo-esittaa
otetaan-yhdet
rss-asiastudio
rss-vaalirankkurit-podcast
rss-podme-livebox
linda-maria
the-ulkopolitist
rss-kaikki-uusiksi
rss-tekkipodi
rikosmyytit
rss-mina-ukkola
rss-kuka-mina-olen
rss-raha-talous-ja-politiikka
rss-kyselytunti