Generative Benchmarking with Kelly Hong - #728
In this episode, Kelly Hong, a researcher at Chroma, joins us to discuss "Generative Benchmarking," a novel approach to evaluating retrieval systems, like RAG applications, using synthetic data. Kelly explains how traditional benchmarks like MTEB fail to represent real-world query patterns and how embedding models that perform well on public benchmarks often underperform in production. The conversation explores the two-step process of Generative Benchmarking: filtering documents to focus on relevant content and generating queries that mimic actual user behavior. Kelly shares insights from applying this approach to Weights & Biases' technical support bot, revealing how domain-specific evaluation provides more accurate assessments of embedding model performance. We also discuss the importance of aligning LLM judges with human preferences, the impact of chunking strategies on retrieval effectiveness, and how production queries differ from benchmark queries in ambiguity and style. Throughout the episode, Kelly emphasizes the need for systematic evaluation approaches that go beyond "vibe checks" to help developers build more effective RAG applications. The complete show notes for this episode can be found at https://twimlai.com/go/728.

Episoder(781)

Information Extraction from Natural Document Formats with David Rosenberg - TWiML Talk #126

Information Extraction from Natural Document Formats with David Rosenberg - TWiML Talk #126

In this episode, I’m joined by David Rosenberg, data scientist in the office of the CTO at financial publisher Bloomberg, to discuss his work on “Extracting Data from Tables and Charts in Natural Docu...

9 Apr 201845min

Human-in-the-Loop AI for Emergency Response & More w/ Robert Munro - TWiML Talk #125

Human-in-the-Loop AI for Emergency Response & More w/ Robert Munro - TWiML Talk #125

In this episode, I chat with Rob Munro, CTO of the newly branded Figure Eight, formerly known as CrowdFlower. Figure Eight’s Human-in-the-Loop AI platform supports data science & machine learning team...

5 Apr 201848min

Systems and Software for Machine Learning at Scale with Jeff Dean - TWiML Talk #124

Systems and Software for Machine Learning at Scale with Jeff Dean - TWiML Talk #124

In this episode I’m joined by Jeff Dean, Google Senior Fellow and head of the company’s deep learning research team Google Brain, who I had a chance to sit down with last week at the Googleplex in Mou...

2 Apr 201854min

Semantic Segmentation of 3D Point Clouds with Lyne Tchapmi - TWiML Talk #123

Semantic Segmentation of 3D Point Clouds with Lyne Tchapmi - TWiML Talk #123

In this episode I’m joined by Lyne Tchapmi, PhD student in the Stanford Computational Vision and Geometry Lab, to discuss her paper, “SEGCloud: Semantic Segmentation of 3D Point Clouds.” SEGCloud is a...

29 Mar 201836min

Predicting Cardiovascular Risk Factors from Eye Images with Ryan Poplin - TWiML Talk #122

Predicting Cardiovascular Risk Factors from Eye Images with Ryan Poplin - TWiML Talk #122

In this episode, I'm joined by Google Research Scientist Ryan Poplin, who recently co-authored the paper “Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning.” ...

26 Mar 201842min

Reproducibility and the Philosophy of Data with Clare Gollnick - TWiML Talk #121

Reproducibility and the Philosophy of Data with Clare Gollnick - TWiML Talk #121

In this episode, i'm joined by Clare Gollnick, CTO of Terbium Labs, to discuss her thoughts on the “reproducibility crisis” currently haunting the scientific landscape. For a little background, a “Nat...

22 Mar 201838min

Surveying the Connected Car Landscape with GK Senthil - TWiML Talk #120

Surveying the Connected Car Landscape with GK Senthil - TWiML Talk #120

In this episode, I’m joined by GK Senthil, director & chief product owner for innovation at Toyota Connected. GK and I spoke about some of the potential opportunities and challenges for smart cars. We...

19 Mar 201830min

Adversarial Attacks Against Reinforcement Learning Agents with Ian Goodfellow & Sandy Huang

Adversarial Attacks Against Reinforcement Learning Agents with Ian Goodfellow & Sandy Huang

In this episode, I’m joined by Ian Goodfellow, Staff Research Scientist at Google Brain and Sandy Huang, Phd Student in the EECS department at UC Berkeley, to discuss their work on the paper Adversari...

15 Mar 201847min

Populært innen Politikk og nyheter

aftenpodden
giver-og-gjengen-vg
lydartikler-fra-aftenposten
forklart
i-retten
aftenpodden-usa
stopp-verden
popradet
det-store-bildet
rss-gukild-johaug
fotballpodden-2
dine-penger-pengeradet
rss-ness
nokon-ma-ga
hanna-de-heldige
e24-podden
aftenbla-bla
grasoner-den-nye-kalde-krigen
frokostshowet-pa-p5
bt-dokumentar-2