Generative Benchmarking with Kelly Hong - #728
In this episode, Kelly Hong, a researcher at Chroma, joins us to discuss "Generative Benchmarking," a novel approach to evaluating retrieval systems, like RAG applications, using synthetic data. Kelly explains how traditional benchmarks like MTEB fail to represent real-world query patterns and how embedding models that perform well on public benchmarks often underperform in production. The conversation explores the two-step process of Generative Benchmarking: filtering documents to focus on relevant content and generating queries that mimic actual user behavior. Kelly shares insights from applying this approach to Weights & Biases' technical support bot, revealing how domain-specific evaluation provides more accurate assessments of embedding model performance. We also discuss the importance of aligning LLM judges with human preferences, the impact of chunking strategies on retrieval effectiveness, and how production queries differ from benchmark queries in ambiguity and style. Throughout the episode, Kelly emphasizes the need for systematic evaluation approaches that go beyond "vibe checks" to help developers build more effective RAG applications. The complete show notes for this episode can be found at https://twimlai.com/go/728.

Episoder(781)

Deep Robotic Learning with Sergey Levine - TWiML Talk #37

Deep Robotic Learning with Sergey Levine - TWiML Talk #37

This week we continue our Industrial AI series with Sergey Levine, an Assistant Professor at UC Berkeley whose research focus is Deep Robotic Learning. Sergey is part of the same research team as a co...

24 Jul 201746min

Smart Buildings & IoT with Yodit Stanton - TWiML Talk #36

Smart Buildings & IoT with Yodit Stanton - TWiML Talk #36

After a brief hiatus, the Industrial AI Series is making its triumphant return! Our guest this week is Yodit Stanton, a self-described Data Nerd, and the Founder & CEO of Opensensors.io. OpenSensors.i...

17 Jul 201753min

Intel Nervana Update + Productizing AI Research with Naveen Rao And Hanlin Tang - TWiML Talk #31

Intel Nervana Update + Productizing AI Research with Naveen Rao And Hanlin Tang - TWiML Talk #31

I talked about Intel’s acquisition of Nervana Systems on the podcast when it happened almost a year ago, so I was super excited to have an opportunity to sit down with Nervana co-founder Naveen Rao, w...

5 Jul 201738min

Expressive AI - Generated Music With Google's Performance RNN - Doug Eck - TWiML Talk #32

Expressive AI - Generated Music With Google's Performance RNN - Doug Eck - TWiML Talk #32

My guest for this second show in our O’Reilly AI series is Doug Eck of Google Brain. Doug did a keynote at the O’Reilly conference on Magenta, Google’s project for melding machine learning and the art...

5 Jul 201746min

The Power Of Probabilistic Programming with Ben Vigoda - TWiML Talk #33

The Power Of Probabilistic Programming with Ben Vigoda - TWiML Talk #33

My guest for this third episode in the O'Reilly AI series is Ben Vigoda. Ben is the founder and CEO of Gamalon, a DARPA-funded startup working on Bayesian Program Synthesis. We dive into what exactly ...

5 Jul 201742min

Video Object Detection At Scale with Reza Zadeh - TWiML Talk #34

Video Object Detection At Scale with Reza Zadeh - TWiML Talk #34

My guest for the fourth show in the O'Reilly AI Series is Reza Zadeh. Reza is an adjunct professor of computational mathematics at Stanford University and founder and CEO of the startup Matroid. Reza ...

5 Jul 201752min

Enhancing Customer Experiences With Emotional AI with Rana El Kaliouby - TWiML Talk #35

Enhancing Customer Experiences With Emotional AI with Rana El Kaliouby - TWiML Talk #35

My guest for this show is Rana el Kaliouby. Rana is co-founder and CEO of Affectiva. Affectiva, as Rana puts it, "is on a mission to humanize technology by bringing in artificial emotional intelligenc...

5 Jul 201733min

Natural Language Understanding for Amazon Alexa with Zornitsa Kozareva - TWiML Talk #30

Natural Language Understanding for Amazon Alexa with Zornitsa Kozareva - TWiML Talk #30

Our guest this week is Zornitsa Kozareva, Manager of Machine Learning with Amazon Web Services Deep Learning, where she leads a group focused on natural language processing and dialogue systems for pr...

29 Jun 201755min

Populært innen Politikk og nyheter

giver-og-gjengen-vg
aftenpodden
aftenpodden-usa
forklart
i-retten
stopp-verden
popradet
lydartikler-fra-aftenposten
rss-gukild-johaug
nokon-ma-ga
fotballpodden-2
det-store-bildet
dine-penger-pengeradet
rss-ness
aftenbla-bla
hanna-de-heldige
frokostshowet-pa-p5
rss-dannet-uten-piano
rss-penger-polser-og-politikk
rss-utenrikskomiteen-med-bogen-og-grasvik