Generative Benchmarking with Kelly Hong - #728

Generative Benchmarking with Kelly Hong - #728

In this episode, Kelly Hong, a researcher at Chroma, joins us to discuss "Generative Benchmarking," a novel approach to evaluating retrieval systems, like RAG applications, using synthetic data. Kelly explains how traditional benchmarks like MTEB fail to represent real-world query patterns and how embedding models that perform well on public benchmarks often underperform in production. The conversation explores the two-step process of Generative Benchmarking: filtering documents to focus on relevant content and generating queries that mimic actual user behavior. Kelly shares insights from applying this approach to Weights & Biases' technical support bot, revealing how domain-specific evaluation provides more accurate assessments of embedding model performance. We also discuss the importance of aligning LLM judges with human preferences, the impact of chunking strategies on retrieval effectiveness, and how production queries differ from benchmark queries in ambiguity and style. Throughout the episode, Kelly emphasizes the need for systematic evaluation approaches that go beyond "vibe checks" to help developers build more effective RAG applications. The complete show notes for this episode can be found at https://twimlai.com/go/728.

Jaksot(779)

Intelligent Autonomous Robots with Ilia Baranov - TWiML Talk #27

Intelligent Autonomous Robots with Ilia Baranov - TWiML Talk #27

Our first guest in the Industrial AI series is Ilia Baranov, engineering manager at Clearpath Robotics. Ilia is responsible for setting the engineering direction for all of Clearpath’s research platfo...

9 Kesä 201753min

Global AI Trends with Ben Lorica - TWiML Talk #26

Global AI Trends with Ben Lorica - TWiML Talk #26

This week I’ve invited my friend Ben Lorica onto the show. Ben is Chief Data Scientist for O’Reilly Media, and Program Director of Strata Data & the O'Reilly A.I. conference. Ben has worked on analyti...

2 Kesä 201754min

Offensive vs Defensive Data Science with Deep Varma - TWiML Talk #25

Offensive vs Defensive Data Science with Deep Varma - TWiML Talk #25

This week on the show my guest is Deep Varma, Vice President of Data Engineering at real estate startup Trulia. Deep has run data engineering teams in silicon valley for well over a decade, and is now...

26 Touko 201753min

Reinforcement Learning: The Next Frontier of Gaming with Danny Lange - TWiML Talk #24

Reinforcement Learning: The Next Frontier of Gaming with Danny Lange - TWiML Talk #24

My guest on the show this week is Danny Lange, VP for Machine Learning & AI at video game technology developer Unity Technologies. Danny is well traveled in the world of ML and AI, and has had a hand ...

20 Touko 201754min

Integrating Psycholinguistics into AI with Dominique Simmons - TWiML Talk #23

Integrating Psycholinguistics into AI with Dominique Simmons - TWiML Talk #23

I think you’re really going to enjoy today’s show. Our guest this week is Dominique Simmons, Applied research Scientist at AI tools vendor Dimensional Mechanics. Dominique brings an interesting backgr...

12 Touko 20171h

Deep Neural Nets for Visual Recognition with Matt Zeiler - TWiML Talk #22

Deep Neural Nets for Visual Recognition with Matt Zeiler - TWiML Talk #22

Today we bring you our final interview from backstage at the NYU FutureLabs AI Summit. Our guest this week is Matt Zeiler. Matt graduated from the University of Toronto where he worked with deep learn...

5 Touko 201722min

Engineering the Future of AI with Ruchir Puri - TWiML Talk #21

Engineering the Future of AI with Ruchir Puri - TWiML Talk #21

Today we bring you the second of three interviews we did backstage from the NYU FutureLabs AI Summit, this time with Ruchir Puri. Ruchir is the Chief Architect at IBM Watson as well as an IBM Fellow. ...

28 Huhti 201720min

Selling AI to the Enterprise with Kathryn Hume - TWiML Talk #20

Selling AI to the Enterprise with Kathryn Hume - TWiML Talk #20

This week's guest is Kathryn Hume. Kathryn is the President of Fast Forward Labs, which is an independent machine intelligence research company that helps organizations accelerate their data science a...

21 Huhti 201723min

Suosittua kategoriassa Politiikka ja uutiset

aikalisa
rss-ootsa-kuullut-tasta
tervo-halme
ootsa-kuullut-tasta-2
politiikan-puskaradio
viisupodi
rss-podme-livebox
otetaan-yhdet
et-sa-noin-voi-sanoo-esittaa
rss-vaalirankkurit-podcast
rss-asiastudio
rss-tasta-on-kyse-ivan-puopolo-verkkouutiset
rss-tekkipodi
io-techin-tekniikkapodcast
linda-maria
the-ulkopolitist
rss-polikulaari-humanisti-vastaa-ja-muut-ts-podcastit
rss-kaikki-uusiksi
rss-hyvaa-huomenta-bryssel
rss-merja-mahkan-rahat