Generative Benchmarking with Kelly Hong - #728
In this episode, Kelly Hong, a researcher at Chroma, joins us to discuss "Generative Benchmarking," a novel approach to evaluating retrieval systems, like RAG applications, using synthetic data. Kelly explains how traditional benchmarks like MTEB fail to represent real-world query patterns and how embedding models that perform well on public benchmarks often underperform in production. The conversation explores the two-step process of Generative Benchmarking: filtering documents to focus on relevant content and generating queries that mimic actual user behavior. Kelly shares insights from applying this approach to Weights & Biases' technical support bot, revealing how domain-specific evaluation provides more accurate assessments of embedding model performance. We also discuss the importance of aligning LLM judges with human preferences, the impact of chunking strategies on retrieval effectiveness, and how production queries differ from benchmark queries in ambiguity and style. Throughout the episode, Kelly emphasizes the need for systematic evaluation approaches that go beyond "vibe checks" to help developers build more effective RAG applications. The complete show notes for this episode can be found at https://twimlai.com/go/728.

Episoder(781)

Scalable Differential Privacy for Deep Learning with Nicolas Papernot - TWiML Talk #134

Scalable Differential Privacy for Deep Learning with Nicolas Papernot - TWiML Talk #134

In this episode of our Differential Privacy series, I'm joined by Nicolas Papernot, Google PhD Fellow in Security and graduate student in the department of computer science at Penn State University. N...

3 Mai 201859min

Differential Privacy at Bluecore with Zahi Karam - TWiML Talk #133

Differential Privacy at Bluecore with Zahi Karam - TWiML Talk #133

In this episode of our Differential Privacy series, I'm joined by Zahi Karam, Director of Data Science at Bluecore, whose retail marketing platform specializes in personalized email marketing. I sat d...

1 Mai 201838min

Differential Privacy Theory & Practice with Aaron Roth - TWiML Talk #132

Differential Privacy Theory & Practice with Aaron Roth - TWiML Talk #132

In the first episode of our Differential Privacy series, I'm joined by Aaron Roth, associate professor of computer science and information science at the University of Pennsylvania. Aaron is first and...

30 Apr 201842min

Optimal Transport and Machine Learning with Marco Cuturi - TWiML Talk #131

Optimal Transport and Machine Learning with Marco Cuturi - TWiML Talk #131

In this episode, i’m joined by Marco Cuturi, professor of statistics at Université Paris-Saclay. Marco and I spent some time discussing his work on Optimal Transport Theory at NIPS last year. In our d...

26 Apr 201832min

Collecting and Annotating Data for AI with Kiran Vajapey - TWiML Talk #130

Collecting and Annotating Data for AI with Kiran Vajapey - TWiML Talk #130

In this episode, I’m joined by Kiran Vajapey, a human-computer interaction developer at Figure Eight. In this interview, Kiran shares some of what he’s has learned through his work developing applicat...

23 Apr 201840min

Autonomous Aerial Guidance, Navigation and Control Systems with Christopher Lum - TWiML Talk #129

Autonomous Aerial Guidance, Navigation and Control Systems with Christopher Lum - TWiML Talk #129

Ok, In this episode, I'm joined by Christopher Lum, Research Assistant Professor in the University of Washington’s Department of Aeronautics and Astronautics. Chris also co-heads the University’s Auto...

19 Apr 201852min

Infrastructure for Autonomous Vehicles with Missy Cummings - TWiML Talk #128

Infrastructure for Autonomous Vehicles with Missy Cummings - TWiML Talk #128

In this episode, I’m joined by Missy Cummings, head of Duke University’s Humans and Autonomy Lab and professor in the department of mechanical engineering. In addition to being an accomplished researc...

16 Apr 201843min

Hyper-Personalizing the Customer Experience w/ AI with Rob Walker - TWiML Talk #127

Hyper-Personalizing the Customer Experience w/ AI with Rob Walker - TWiML Talk #127

In this episode, we're joined by Rob Walker, Vice President of decision management and analytics at Pegasystems, a leading provider of software for customer engagement and operational excellence. Rob ...

12 Apr 201841min

Populært innen Politikk og nyheter

aftenpodden
giver-og-gjengen-vg
lydartikler-fra-aftenposten
forklart
i-retten
aftenpodden-usa
stopp-verden
popradet
det-store-bildet
rss-gukild-johaug
fotballpodden-2
dine-penger-pengeradet
rss-ness
nokon-ma-ga
hanna-de-heldige
e24-podden
aftenbla-bla
grasoner-den-nye-kalde-krigen
frokostshowet-pa-p5
bt-dokumentar-2