Generative Benchmarking with Kelly Hong - #728

Generative Benchmarking with Kelly Hong - #728

In this episode, Kelly Hong, a researcher at Chroma, joins us to discuss "Generative Benchmarking," a novel approach to evaluating retrieval systems, like RAG applications, using synthetic data. Kelly explains how traditional benchmarks like MTEB fail to represent real-world query patterns and how embedding models that perform well on public benchmarks often underperform in production. The conversation explores the two-step process of Generative Benchmarking: filtering documents to focus on relevant content and generating queries that mimic actual user behavior. Kelly shares insights from applying this approach to Weights & Biases' technical support bot, revealing how domain-specific evaluation provides more accurate assessments of embedding model performance. We also discuss the importance of aligning LLM judges with human preferences, the impact of chunking strategies on retrieval effectiveness, and how production queries differ from benchmark queries in ambiguity and style. Throughout the episode, Kelly emphasizes the need for systematic evaluation approaches that go beyond "vibe checks" to help developers build more effective RAG applications. The complete show notes for this episode can be found at https://twimlai.com/go/728.

Jaksot(779)

From Particle Physics to Audio AI with Scott Stephenson - TWiML Talk #19

From Particle Physics to Audio AI with Scott Stephenson - TWiML Talk #19

This week my guest is Scott Stephenson. Scott is co-Founder & CEO of Deepgram, which has developed an AI-based platform for indexing and searching audio and video. Scott and I cover a ton of interesti...

14 Huhti 201756min

(5/5) AlphaVertex - Creating a Worldwide Financial Knowledge Graph - TWiML Talk #18

(5/5) AlphaVertex - Creating a Worldwide Financial Knowledge Graph - TWiML Talk #18

This week I'm on location at NYU/ffVC AI NexusLab startup accelerator, speaking with founders from the 5 companies in the program's inaugural batch. This interview is with AlphaVertex, a FinTech start...

7 Huhti 201726min

(4/5) Behold.ai - Increasing Efficiency of Healthcare Insurance Billing with NLP - TWiML Talk #18

(4/5) Behold.ai - Increasing Efficiency of Healthcare Insurance Billing with NLP - TWiML Talk #18

This week I'm on location at NYU/ffVC AI NexusLab startup accelerator, speaking with founders from the 5 companies in the program's inaugural batch. This interview is with Behold.ai, which uses comput...

7 Huhti 201716min

(3/5) Cambrian Intelligence - Using AI to Simplify the Programming of Robots - TWiML Talk #18

(3/5) Cambrian Intelligence - Using AI to Simplify the Programming of Robots - TWiML Talk #18

This week I'm on location at NYU/ffVC AI NexusLab startup accelerator, speaking with founders from the 5 companies in the program's inaugural batch. This interview is with Cambrian Intelligence, a com...

7 Huhti 201723min

(2/5) Klustera - Location-Based Intelligence for Smarter Marketing - TWiML Talk #18

(2/5) Klustera - Location-Based Intelligence for Smarter Marketing - TWiML Talk #18

This week I'm on location at NYU/ffVC AI NexusLab startup accelerator, speaking with founders from the 5 companies in the program's inaugural batch. This interview is with Klustera, a company applying...

7 Huhti 201722min

(1/5) HelloVera - AI-Powered Customer Support  - TWiML Talk #18

(1/5) HelloVera - AI-Powered Customer Support - TWiML Talk #18

This week I'm on location at NYU/ffVC AI NexusLab startup accelerator, speaking with founders from the 5 companies in the program's inaugural batch. This interview is with HelloVera, a company applyin...

7 Huhti 201725min

Interactive Machine Learning Systems with Alekh Agarwal - TWiML Talk #17

Interactive Machine Learning Systems with Alekh Agarwal - TWiML Talk #17

This week my guest is Alekh Agarwal. Alekh is a researcher with Microsoft Research whose research is focused on Interactive Machine Learning. In our discussion, Alekh and I discuss various aspects of ...

31 Maalis 201730min

Machine Learning in Cybersecurity with Evan Wright - TWiML Talk #16

Machine Learning in Cybersecurity with Evan Wright - TWiML Talk #16

This week my guest is Evan Wright, principal data scientist at cybersecurity startup Anomali. In my interview with Evan, he and I discussed about a number of topics surrounding the use of machine lear...

24 Maalis 20171h 4min

Suosittua kategoriassa Politiikka ja uutiset

aikalisa
rss-ootsa-kuullut-tasta
tervo-halme
ootsa-kuullut-tasta-2
politiikan-puskaradio
viisupodi
rss-podme-livebox
otetaan-yhdet
et-sa-noin-voi-sanoo-esittaa
rss-vaalirankkurit-podcast
rss-asiastudio
rss-tasta-on-kyse-ivan-puopolo-verkkouutiset
rss-tekkipodi
io-techin-tekniikkapodcast
linda-maria
the-ulkopolitist
rss-polikulaari-humanisti-vastaa-ja-muut-ts-podcastit
rss-kaikki-uusiksi
rss-hyvaa-huomenta-bryssel
rss-merja-mahkan-rahat