Generative Benchmarking with Kelly Hong - #728
In this episode, Kelly Hong, a researcher at Chroma, joins us to discuss "Generative Benchmarking," a novel approach to evaluating retrieval systems, like RAG applications, using synthetic data. Kelly explains how traditional benchmarks like MTEB fail to represent real-world query patterns and how embedding models that perform well on public benchmarks often underperform in production. The conversation explores the two-step process of Generative Benchmarking: filtering documents to focus on relevant content and generating queries that mimic actual user behavior. Kelly shares insights from applying this approach to Weights & Biases' technical support bot, revealing how domain-specific evaluation provides more accurate assessments of embedding model performance. We also discuss the importance of aligning LLM judges with human preferences, the impact of chunking strategies on retrieval effectiveness, and how production queries differ from benchmark queries in ambiguity and style. Throughout the episode, Kelly emphasizes the need for systematic evaluation approaches that go beyond "vibe checks" to help developers build more effective RAG applications. The complete show notes for this episode can be found at https://twimlai.com/go/728.

Episoder(783)

Experimental Creative Writing with the Vectorized Word - Allison Parish - TWIML Talk #72

Experimental Creative Writing with the Vectorized Word - Allison Parish - TWIML Talk #72

This week, we’ll be featuring a series of shows recorded from Strange Loop, a great developer-focused conference that takes place every year right in my backyard! The conference is a multi-disciplinar...

24 Nov 201728min

The Biological Path Towards Strong AI - Matthew Taylor - TWiML Talk #71

The Biological Path Towards Strong AI - Matthew Taylor - TWiML Talk #71

This week, we’ll be featuring a series of shows recorded from Strange Loop, a great developer-focused conference that takes place every year right in my backyard! The conference is a multi-disciplinar...

22 Nov 201737min

Pytorch: Fast Differentiable Dynamic Graphs in Python with Soumith Chintala - TWiML Talk #70

Pytorch: Fast Differentiable Dynamic Graphs in Python with Soumith Chintala - TWiML Talk #70

This week, we’ll be featuring a series of shows recorded from Strange Loop, a great developer-focused conference that takes place every year right in my backyard! The conference is a multi-disciplinar...

21 Nov 201742min

Accessible Machine Learning for the Enterprise Developer with Ryan Sevey & Jason Montgomery

Accessible Machine Learning for the Enterprise Developer with Ryan Sevey & Jason Montgomery

This week, we’ll be featuring a series of shows recorded from Strange Loop, a great developer-focused conference that takes place every year right in my backyard! The conference is a multi-disciplinar...

20 Nov 201745min

Bridging the Gap Between Academic and Industry Careers with Ross Fadely - TWiML Talk #68

Bridging the Gap Between Academic and Industry Careers with Ross Fadely - TWiML Talk #68

We close out our NYU Future Labs AI Summit interview series with Ross Fadely, a New York based AI lead with Insight Data Science. Insight is an interesting company offering a free seven week post-doct...

16 Nov 201719min

The Limitations of Human-in-the-Loop AI with Dennis Mortensen - TWiML Talk #67

The Limitations of Human-in-the-Loop AI with Dennis Mortensen - TWiML Talk #67

We continue our NYU Future Labs AI Summit interview series with Dennis Mortensen, founder and CEO of X.ai, a company whose AI-based personal assistant Amy helps users with scheduling meetings. I caugh...

13 Nov 201735min

Nexus Lab Cohort 2 - Second Mind - TWiML Talk #66

Nexus Lab Cohort 2 - Second Mind - TWiML Talk #66

The podcast you’re about to hear is the fourth of a series of shows recorded at the NYU Future Labs AI Summit last week in New York City. In this show, I speak with Kul Singh, CEO and Founder of Secon...

9 Nov 201721min

Nexus Lab Cohort 2 - Bite.ai - TWiML Talk #65

Nexus Lab Cohort 2 - Bite.ai - TWiML Talk #65

The podcast you’re about to hear is the second of a series of shows recorded at the NYU Future Labs AI Summit last week in New York City.In this episode, you’ll hear from Bite.ai, a startup founded by...

8 Nov 201726min

Populært innen Politikk og nyheter

giver-og-gjengen-vg
aftenpodden
aftenpodden-usa
forklart
popradet
stopp-verden
dine-penger-pengeradet
rss-gukild-johaug
det-store-bildet
nokon-ma-ga
fotballpodden-2
lydartikler-fra-aftenposten
hanna-de-heldige
rss-ness
aftenbla-bla
rss-espen-lee-usensurert
rss-dannet-uten-piano
rss-penger-polser-og-politikk
frokostshowet-pa-p5
e24-podden