Generative Benchmarking with Kelly Hong - #728
In this episode, Kelly Hong, a researcher at Chroma, joins us to discuss "Generative Benchmarking," a novel approach to evaluating retrieval systems, like RAG applications, using synthetic data. Kelly explains how traditional benchmarks like MTEB fail to represent real-world query patterns and how embedding models that perform well on public benchmarks often underperform in production. The conversation explores the two-step process of Generative Benchmarking: filtering documents to focus on relevant content and generating queries that mimic actual user behavior. Kelly shares insights from applying this approach to Weights & Biases' technical support bot, revealing how domain-specific evaluation provides more accurate assessments of embedding model performance. We also discuss the importance of aligning LLM judges with human preferences, the impact of chunking strategies on retrieval effectiveness, and how production queries differ from benchmark queries in ambiguity and style. Throughout the episode, Kelly emphasizes the need for systematic evaluation approaches that go beyond "vibe checks" to help developers build more effective RAG applications. The complete show notes for this episode can be found at https://twimlai.com/go/728.

Avsnitt(779)

Knowledge Graphs and Expert Augmentation with Marisa Boston - TWiML Talk #204

Knowledge Graphs and Expert Augmentation with Marisa Boston - TWiML Talk #204

Today we’re joined by Marisa Boston, Director of Cognitive Technology in KPMG’s Cognitive Automation Lab. We caught up to discuss some of the ways that KPMG is using AI to build tools that help augmen...

29 Nov 201846min

ML/DL for Non-Stationary Time Series Analysis in Financial Markets and Beyond with Stuart Reid - TWiML Talk #203

ML/DL for Non-Stationary Time Series Analysis in Financial Markets and Beyond with Stuart Reid - TWiML Talk #203

Today, we’re joined by Stuart Reid, Chief Scientist at NMRQL Research. NMRQL is an investment management firm that uses ML algorithms to make adaptive, unbiased, scalable, and testable trading decisi...

26 Nov 201858min

Industrializing Machine Learning at Shell with Daniel Jeavons - TWiML Talk #202

Industrializing Machine Learning at Shell with Daniel Jeavons - TWiML Talk #202

In this episode of our AI Platforms series, we’re joined by Daniel Jeavons, General Manager of Data Science at Shell. In our conversation, we explore the evolution of analytics and data science at Sh...

21 Nov 201845min

Resurrecting a Recommendations Platform at Comcast with Leemay Nassery - TWiML Talk #201

Resurrecting a Recommendations Platform at Comcast with Leemay Nassery - TWiML Talk #201

In this episode of our AI Platforms series, we’re joined by Leemay Nassery, Senior Engineering Manager and head of the recommendations team at Comcast. In our conversation, Leemay and I discuss just h...

19 Nov 201847min

Productive Machine Learning at LinkedIn with Bee-Chung Chen - TWiML Talk #200

Productive Machine Learning at LinkedIn with Bee-Chung Chen - TWiML Talk #200

In this episode of our AI Platforms series, we’re joined by Bee-Chung Chen, Principal Staff Engineer and Applied Researcher at LinkedIn. Bee-Chung and I caught up to discuss LinkedIn’s internal AI aut...

15 Nov 201847min

Scaling Deep Learning on Kubernetes at OpenAI with Christopher Berner - TWiML Talk #199

Scaling Deep Learning on Kubernetes at OpenAI with Christopher Berner - TWiML Talk #199

In this episode of our AI Platforms series we’re joined by OpenAI’s Head of Infrastructure, Christopher Berner. In our conversation, we discuss the evolution of OpenAI’s deep learning platform, the co...

12 Nov 201849min

Bighead: Airbnb's Machine Learning Platform with Atul Kale - TWiML Talk #198

Bighead: Airbnb's Machine Learning Platform with Atul Kale - TWiML Talk #198

In this episode of our AI Platforms series, we’re joined by Atul Kale, Engineering Manager on the machine learning infrastructure team at Airbnb. In our conversation, we discuss Airbnb’s internal mac...

8 Nov 201849min

Facebook's FBLearner Platform with Aditya Kalro - TWiML Talk #197

Facebook's FBLearner Platform with Aditya Kalro - TWiML Talk #197

In the kickoff episode of our AI Platforms series, we’re joined by Aditya Kalro, Engineering Manager at Facebook, to discuss their internal machine learning platform FBLearner Flow. FBLearner Flow is ...

6 Nov 201838min

Populärt inom Politik & nyheter

aftonbladet-krim
motiv
rss-krimstad
p3-krim
fordomspodden
flashback-forever
rss-viva-fotboll
spar
svenska-fall
aftonbladet-daily
rss-sanning-konsekvens
rss-vad-fan-hande
svd-dokumentara-berattelser-2
olyckan-inifran
grans
rss-aftonbladet-krim
rss-frandfors-horna
dagens-eko
rss-krimreportrarna
svd-ledarredaktionen