Generative Benchmarking with Kelly Hong - #728

Generative Benchmarking with Kelly Hong - #728

In this episode, Kelly Hong, a researcher at Chroma, joins us to discuss "Generative Benchmarking," a novel approach to evaluating retrieval systems, like RAG applications, using synthetic data. Kelly explains how traditional benchmarks like MTEB fail to represent real-world query patterns and how embedding models that perform well on public benchmarks often underperform in production. The conversation explores the two-step process of Generative Benchmarking: filtering documents to focus on relevant content and generating queries that mimic actual user behavior. Kelly shares insights from applying this approach to Weights & Biases' technical support bot, revealing how domain-specific evaluation provides more accurate assessments of embedding model performance. We also discuss the importance of aligning LLM judges with human preferences, the impact of chunking strategies on retrieval effectiveness, and how production queries differ from benchmark queries in ambiguity and style. Throughout the episode, Kelly emphasizes the need for systematic evaluation approaches that go beyond "vibe checks" to help developers build more effective RAG applications. The complete show notes for this episode can be found at https://twimlai.com/go/728.

Jaksot(779)

Geometric Statistics in Machine Learning w/ geomstats with Nina Miolane - TWiML Talk #196

Geometric Statistics in Machine Learning w/ geomstats with Nina Miolane - TWiML Talk #196

In this episode we’re joined by Nina Miolane, researcher and lecturer at Stanford University. Nina and I spoke about her work in the field of geometric statistics in ML, specifically the application o...

1 Marras 201843min

Milestones in Neural Natural Language Processing with Sebastian Ruder - TWiML Talk #195

Milestones in Neural Natural Language Processing with Sebastian Ruder - TWiML Talk #195

In this episode, we’re joined by Sebastian Ruder, PhD student studying NLP at National University of Ireland and Research Scientist at text analysis startup Aylien. We discuss recent milestones in neu...

29 Loka 20181h 1min

Natural Language Processing at StockTwits with Garrett Hoffman - TWiML Talk #194

Natural Language Processing at StockTwits with Garrett Hoffman - TWiML Talk #194

In this episode, we’re joined by Garrett Hoffman, Director of Data Science at Stocktwits. Stocktwits is a social network for the investing community which has its roots in the use of the $cashtag on T...

25 Loka 201850min

Advanced Reinforcement Learning & Data Science for Social Impact with Vukosi Marivate - TWiML Talk #193

Advanced Reinforcement Learning & Data Science for Social Impact with Vukosi Marivate - TWiML Talk #193

In the final episode of our Deep Learning Indaba series, we speak with Vukosi Marivate, Chair of Data Science at the University of Pretoria and a co-organizer of the Indaba. My conversation with Vuko...

23 Loka 201846min

AI Ethics, Strategic Decisioning and Game Theory with Osonde Osoba - TWiML Talk #192

AI Ethics, Strategic Decisioning and Game Theory with Osonde Osoba - TWiML Talk #192

In this episode of our Deep Learning Indaba Series, we’re joined by Osonde Osoba, Engineer at RAND Corporation. Osonde and I spoke on the heels of the Indaba, where he presented on AI Ethics and Poli...

18 Loka 201847min

Acoustic Word Embeddings for Low Resource Speech Processing with Herman Kamper - TWiML Talk #191

Acoustic Word Embeddings for Low Resource Speech Processing with Herman Kamper - TWiML Talk #191

In this episode of our Deep Learning Indaba Series, we’re joined by Herman Kamper, lecturer at Stellenbosch University in SA and a co-organizer of the Indaba. We discuss his work on limited- and zero...

16 Loka 20181h 1min

Learning Representations for Visual Search with Naila Murray - TWiML Talk #190

Learning Representations for Visual Search with Naila Murray - TWiML Talk #190

In this episode of our Deep Learning Indaba series, we’re joined by Naila Murray, Senior Research Scientist and Group Lead in the computer vision group at Naver Labs Europe. Naila presented at the In...

12 Loka 201841min

Evaluating Model Explainability Methods with Sara Hooker - TWiML Talk #189

Evaluating Model Explainability Methods with Sara Hooker - TWiML Talk #189

In this, the first episode of the Deep Learning Indaba series, we’re joined by Sara Hooker, AI Resident at Google Brain. I spoke with Sara in the run-up to the Indaba about her work on interpretabilit...

10 Loka 20181h 3min

Suosittua kategoriassa Politiikka ja uutiset

aikalisa
rss-ootsa-kuullut-tasta
tervo-halme
ootsa-kuullut-tasta-2
politiikan-puskaradio
viisupodi
otetaan-yhdet
rss-podme-livebox
rss-asiastudio
et-sa-noin-voi-sanoo-esittaa
rss-vaalirankkurit-podcast
the-ulkopolitist
linda-maria
rss-kaikki-uusiksi
rss-mina-ukkola
rss-pykalien-takaa
rss-merja-mahkan-rahat
rss-kuka-mina-olen
rss-raha-talous-ja-politiikka
rss-kyselytunti