Generative Benchmarking with Kelly Hong - #728
In this episode, Kelly Hong, a researcher at Chroma, joins us to discuss "Generative Benchmarking," a novel approach to evaluating retrieval systems, like RAG applications, using synthetic data. Kelly explains how traditional benchmarks like MTEB fail to represent real-world query patterns and how embedding models that perform well on public benchmarks often underperform in production. The conversation explores the two-step process of Generative Benchmarking: filtering documents to focus on relevant content and generating queries that mimic actual user behavior. Kelly shares insights from applying this approach to Weights & Biases' technical support bot, revealing how domain-specific evaluation provides more accurate assessments of embedding model performance. We also discuss the importance of aligning LLM judges with human preferences, the impact of chunking strategies on retrieval effectiveness, and how production queries differ from benchmark queries in ambiguity and style. Throughout the episode, Kelly emphasizes the need for systematic evaluation approaches that go beyond "vibe checks" to help developers build more effective RAG applications. The complete show notes for this episode can be found at https://twimlai.com/go/728.

Episoder(779)

Domain Knowledge in Machine Learning Models for Sustainability with Stefano Ermon - TWiML Talk #15

Domain Knowledge in Machine Learning Models for Sustainability with Stefano Ermon - TWiML Talk #15

My guest this week is Stefano Ermon, Assistant Professor of Computer Science at Stanford University, and Fellow at Stanford’s Woods Institute for the Environment. Stefano and I met at the Re-Work Deep...

17 Mar 201754min

Scaling Deep Learning: Systems Challenges & More with Shubho Sengupta — TWiML Talk #14

Scaling Deep Learning: Systems Challenges & More with Shubho Sengupta — TWiML Talk #14

This week my guest is Shubho Sengupta, Research Scientist at Baidu. I had the pleasure of meeting Shubho at the Rework Deep Learning Summit earlier this year, where he delivered a presentation on Syst...

10 Mar 20171h 12min

Understanding Deep Neural Nets with Dr. James McCaffrey - TWiML Talk #13

Understanding Deep Neural Nets with Dr. James McCaffrey - TWiML Talk #13

My guest this week is Dr. James McCaffrey, research engineer at Microsoft Research. James and I cover a ton of ground in this conversation, including recurrent neural nets (RNNs), convolutional neural...

3 Mar 20171h 16min

Brendan Frey - Reprogramming the Human Genome with AI - TWiML Talk #12

Brendan Frey - Reprogramming the Human Genome with AI - TWiML Talk #12

My guest this week is Brendan Frey, Professor of Engineering and Medicine at the University of Toronto and Co-Founder and CEO of the startup Deep Genomics. Brendan and I met at the Re-Work Deep Learni...

24 Feb 20171h

Hilary Mason - Building AI Products - TWiML Talk #11

Hilary Mason - Building AI Products - TWiML Talk #11

My guest this time is Hilary Mason. Hilary was one of the first “famous” data scientists. I remember hearing her speak back in 2011 at the Strange Loop conference in St. Louis. At the time she was Chi...

25 Jan 201717min

Francisco Webber - Statistics vs Semantics for Natural Language Processing - TWiML Talk #10

Francisco Webber - Statistics vs Semantics for Natural Language Processing - TWiML Talk #10

My guest this time is Francisco Webber, founder and General Manager of artificial intelligence startup Cortical.io. Francisco presented at the O’Reilly AI conference on an approach to natural language...

3 Des 201649min

Pascale Fung - Emotional AI: Teaching Computers Empathy - TWiML Talk #9

Pascale Fung - Emotional AI: Teaching Computers Empathy - TWiML Talk #9

My guest this time is Pascale Fung, professor of electrical & computer engineering at Hong Kong University of Science and Technology. Pascale delivered a presentation at the recent O'Reilly AI confere...

8 Nov 201634min

Diogo Almeida - Deep Learning: Modular in Theory, Inflexible in Practice - TWiML Talk #8

Diogo Almeida - Deep Learning: Modular in Theory, Inflexible in Practice - TWiML Talk #8

My guest this time is Diogo Almeida, senior data scientist at healthcare startup Enlitic. Diogo and I met at the O'Reilly AI conference, where he delivered a great presentation on in-the-trenches deep...

23 Okt 201646min

Populært innen Politikk og nyheter

giver-og-gjengen-vg
aftenpodden-usa
aftenpodden
i-retten
forklart
stopp-verden
popradet
fotballpodden-2
rss-gukild-johaug
nokon-ma-ga
det-store-bildet
dine-penger-pengeradet
bt-dokumentar-2
aftenbla-bla
hanna-de-heldige
rss-penger-polser-og-politikk
rss-dannet-uten-piano
frokostshowet-pa-p5
rss-ness
e24-podden