Generative Benchmarking with Kelly Hong - #728

Generative Benchmarking with Kelly Hong - #728

In this episode, Kelly Hong, a researcher at Chroma, joins us to discuss "Generative Benchmarking," a novel approach to evaluating retrieval systems, like RAG applications, using synthetic data. Kelly explains how traditional benchmarks like MTEB fail to represent real-world query patterns and how embedding models that perform well on public benchmarks often underperform in production. The conversation explores the two-step process of Generative Benchmarking: filtering documents to focus on relevant content and generating queries that mimic actual user behavior. Kelly shares insights from applying this approach to Weights & Biases' technical support bot, revealing how domain-specific evaluation provides more accurate assessments of embedding model performance. We also discuss the importance of aligning LLM judges with human preferences, the impact of chunking strategies on retrieval effectiveness, and how production queries differ from benchmark queries in ambiguity and style. Throughout the episode, Kelly emphasizes the need for systematic evaluation approaches that go beyond "vibe checks" to help developers build more effective RAG applications. The complete show notes for this episode can be found at https://twimlai.com/go/728.

Jaksot(779)

Graph Analytic Systems with Zachary Hanif - TWiML Talk #188

Graph Analytic Systems with Zachary Hanif - TWiML Talk #188

In this, the final episode of our Strata Data Conference series, we’re joined by Zachary Hanif, Director of Machine Learning at Capital One’s Center for Machine Learning. We start our discussion wit...

8 Loka 201854min

Diversification in Recommender Systems with Ahsan Ashraf - TWiML Talk #187

Diversification in Recommender Systems with Ahsan Ashraf - TWiML Talk #187

In this episode of our Strata Data conference series, we’re joined by Ahsan Ashraf, data scientist at Pinterest. We discuss his presentation, “Diversification in recommender systems: Using topical var...

4 Loka 201844min

The Fastai v1 Deep Learning Framework with Jeremy Howard - TWiML Talk #186

The Fastai v1 Deep Learning Framework with Jeremy Howard - TWiML Talk #186

In today's episode we're presenting a special conversation with Jeremy Howard, founder and researcher at Fast.ai. This episode is being released today in conjunction with the company’s announcement of...

2 Loka 20181h 11min

Federated ML for Edge Applications with Justin Norman - TWiML Talk #185

Federated ML for Edge Applications with Justin Norman - TWiML Talk #185

In this episode we’re joined by Justin Norman, Director of Research and Data Science Services at Cloudera Fast Forward Labs. In my chat with Justin we start with an update on the company before diving...

27 Syys 201847min

Exploring Dark Energy & Star Formation w/ ML with Viviana Acquaviva - TWiML Talk #184

Exploring Dark Energy & Star Formation w/ ML with Viviana Acquaviva - TWiML Talk #184

In today’s episode of our Strata Data series, we’re joined by Viviana Acquaviva, Associate Professor at City Tech, the New York City College of Technology. In our conversation, we discuss an ongoing p...

26 Syys 201840min

Document Vectors in the Wild with James Dreiss - TWiML Talk #183

Document Vectors in the Wild with James Dreiss - TWiML Talk #183

In this episode of our Strata Data series we’re joined by James Dreiss, Senior Data Scientist at international news syndicate Reuters. James and I sat down to discuss his talk from the conference “Doc...

24 Syys 201840min

Applied Machine Learning for Publishers with Naveed Ahmad - TWiML Talk #182

Applied Machine Learning for Publishers with Naveed Ahmad - TWiML Talk #182

In today’s episode we’re joined by Naveed Ahmad, Senior Director of data engineering and machine learning at Hearst Newspapers. In our conversation, we discuss into the role of ML at Hearst, including...

20 Syys 201839min

Anticipating Superintelligence with Nick Bostrom - TWiML Talk #181

Anticipating Superintelligence with Nick Bostrom - TWiML Talk #181

In this episode, we’re joined by Nick Bostrom, professor at the University of Oxford and head of the Future of Humanity Institute, a multidisciplinary institute focused on answering big-picture questi...

17 Syys 201844min

Suosittua kategoriassa Politiikka ja uutiset

aikalisa
rss-ootsa-kuullut-tasta
tervo-halme
ootsa-kuullut-tasta-2
politiikan-puskaradio
viisupodi
otetaan-yhdet
rss-podme-livebox
rss-asiastudio
et-sa-noin-voi-sanoo-esittaa
rss-vaalirankkurit-podcast
the-ulkopolitist
linda-maria
rss-kaikki-uusiksi
rss-mina-ukkola
rss-pykalien-takaa
rss-merja-mahkan-rahat
rss-kuka-mina-olen
rss-raha-talous-ja-politiikka
rss-kyselytunti