Generative Benchmarking with Kelly Hong - #728

Generative Benchmarking with Kelly Hong - #728

In this episode, Kelly Hong, a researcher at Chroma, joins us to discuss "Generative Benchmarking," a novel approach to evaluating retrieval systems, like RAG applications, using synthetic data. Kelly explains how traditional benchmarks like MTEB fail to represent real-world query patterns and how embedding models that perform well on public benchmarks often underperform in production. The conversation explores the two-step process of Generative Benchmarking: filtering documents to focus on relevant content and generating queries that mimic actual user behavior. Kelly shares insights from applying this approach to Weights & Biases' technical support bot, revealing how domain-specific evaluation provides more accurate assessments of embedding model performance. We also discuss the importance of aligning LLM judges with human preferences, the impact of chunking strategies on retrieval effectiveness, and how production queries differ from benchmark queries in ambiguity and style. Throughout the episode, Kelly emphasizes the need for systematic evaluation approaches that go beyond "vibe checks" to help developers build more effective RAG applications. The complete show notes for this episode can be found at https://twimlai.com/go/728.

Jaksot(779)

Learning Active Learning with Ksenia Konyushkova - TWiML Talk #116

Learning Active Learning with Ksenia Konyushkova - TWiML Talk #116

In this episode, I speak with Ksenia Konyushkova, Ph.D. student in the CVLab at Ecole Polytechnique Federale de Lausanne in Switzerland. Ksenia and I connected at NIPS in December to discuss her inter...

5 Maalis 201831min

Machine Learning Platforms at Uber with Mike Del Balso - TWiML Talk #115

Machine Learning Platforms at Uber with Mike Del Balso - TWiML Talk #115

In this episode, I speak with Mike Del Balso, Product Manager for Machine Learning Platforms at Uber. Mike and I sat down last fall at the Georgian Partners Portfolio conference to discuss his present...

1 Maalis 201849min

Inverse Programming for Deeper AI with Zenna Tavares - TWiML Talk #114

Inverse Programming for Deeper AI with Zenna Tavares - TWiML Talk #114

For today’s show, the final episode of our Black in AI Series, I’m joined by Zenna Tavares, a PhD student in the both the department of Brain and Cognitive Sciences and the Computer Science and Artifi...

26 Helmi 201828min

Statistical Relational Artificial Intelligence with Sriraam Natarajan - TWiML Talk #113

Statistical Relational Artificial Intelligence with Sriraam Natarajan - TWiML Talk #113

In this episode, I speak with Sriraam Natarajan, Associate Professor in the Department of Computer Science at UT Dallas. While at NIPS a few months back, Sriraam and I sat down to discuss his work on ...

23 Helmi 201847min

Classical Machine Learning for Infant Medical Diagnosis with Charles Onu - TWiML Talk #112

Classical Machine Learning for Infant Medical Diagnosis with Charles Onu - TWiML Talk #112

In this episode, part 4 in our Black in AI series, i'm joined by Charles Onu, Phd Student at McGill University in Montreal & Founder of Ubenwa, a startup tackling the problem of infant mortality due t...

20 Helmi 201848min

Learning "Common Sense" and Physical Concepts with Roland Memisevic - TWiML Talk #111

Learning "Common Sense" and Physical Concepts with Roland Memisevic - TWiML Talk #111

In today’s episode, I’m joined by Roland Memisevic, co-founder, CEO, and chief scientist at Twenty Billion Neurons. Roland joined me at the RE•WORK Deep Learning Summit in Montreal to discuss the work...

15 Helmi 201832min

Trust in Human-Robot/AI Interactions with Ayanna Howard - TWiML Talk #110

Trust in Human-Robot/AI Interactions with Ayanna Howard - TWiML Talk #110

In this episode, the third in our Black in AI series, I speak with Ayanna Howard, Chair of the Interactive School of Computing at Georgia Tech. Ayanna joined me for a lively discussion about her work ...

13 Helmi 201846min

Data Science for Poaching Prevention and Disease Treatment with Nyalleng Moorosi - TWiML Talk #109

Data Science for Poaching Prevention and Disease Treatment with Nyalleng Moorosi - TWiML Talk #109

For today’s show, I'm joined by Nyalleng Moorosi, Senior Data Science Researcher at The Council for Scientific & Industrial Research or CSIR, in Pretoria, South Africa. In our discussion, we discuss t...

8 Helmi 201852min

Suosittua kategoriassa Politiikka ja uutiset

aikalisa
rss-ootsa-kuullut-tasta
tervo-halme
ootsa-kuullut-tasta-2
politiikan-puskaradio
viisupodi
et-sa-noin-voi-sanoo-esittaa
otetaan-yhdet
rss-vaalirankkurit-podcast
rss-asiastudio
rss-podme-livebox
the-ulkopolitist
rss-kaikki-uusiksi
rss-tekkipodi
io-techin-tekniikkapodcast
rikosmyytit
rss-mina-ukkola
rss-fingo-podcast
rss-hyvaa-huomenta-bryssel
rss-kuka-mina-olen