Generative Benchmarking with Kelly Hong - #728

Generative Benchmarking with Kelly Hong - #728

In this episode, Kelly Hong, a researcher at Chroma, joins us to discuss "Generative Benchmarking," a novel approach to evaluating retrieval systems, like RAG applications, using synthetic data. Kelly explains how traditional benchmarks like MTEB fail to represent real-world query patterns and how embedding models that perform well on public benchmarks often underperform in production. The conversation explores the two-step process of Generative Benchmarking: filtering documents to focus on relevant content and generating queries that mimic actual user behavior. Kelly shares insights from applying this approach to Weights & Biases' technical support bot, revealing how domain-specific evaluation provides more accurate assessments of embedding model performance. We also discuss the importance of aligning LLM judges with human preferences, the impact of chunking strategies on retrieval effectiveness, and how production queries differ from benchmark queries in ambiguity and style. Throughout the episode, Kelly emphasizes the need for systematic evaluation approaches that go beyond "vibe checks" to help developers build more effective RAG applications. The complete show notes for this episode can be found at https://twimlai.com/go/728.

Jaksot(779)

Systems and Software for Machine Learning at Scale with Jeff Dean - TWiML Talk #124

Systems and Software for Machine Learning at Scale with Jeff Dean - TWiML Talk #124

In this episode I’m joined by Jeff Dean, Google Senior Fellow and head of the company’s deep learning research team Google Brain, who I had a chance to sit down with last week at the Googleplex in Mou...

2 Huhti 201854min

Semantic Segmentation of 3D Point Clouds with Lyne Tchapmi - TWiML Talk #123

Semantic Segmentation of 3D Point Clouds with Lyne Tchapmi - TWiML Talk #123

In this episode I’m joined by Lyne Tchapmi, PhD student in the Stanford Computational Vision and Geometry Lab, to discuss her paper, “SEGCloud: Semantic Segmentation of 3D Point Clouds.” SEGCloud is a...

29 Maalis 201836min

Predicting Cardiovascular Risk Factors from Eye Images with Ryan Poplin - TWiML Talk #122

Predicting Cardiovascular Risk Factors from Eye Images with Ryan Poplin - TWiML Talk #122

In this episode, I'm joined by Google Research Scientist Ryan Poplin, who recently co-authored the paper “Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning.” ...

26 Maalis 201842min

Reproducibility and the Philosophy of Data with Clare Gollnick - TWiML Talk #121

Reproducibility and the Philosophy of Data with Clare Gollnick - TWiML Talk #121

In this episode, i'm joined by Clare Gollnick, CTO of Terbium Labs, to discuss her thoughts on the “reproducibility crisis” currently haunting the scientific landscape. For a little background, a “Nat...

22 Maalis 201838min

Surveying the Connected Car Landscape with GK Senthil - TWiML Talk #120

Surveying the Connected Car Landscape with GK Senthil - TWiML Talk #120

In this episode, I’m joined by GK Senthil, director & chief product owner for innovation at Toyota Connected. GK and I spoke about some of the potential opportunities and challenges for smart cars. We...

19 Maalis 201830min

Adversarial Attacks Against Reinforcement Learning Agents with Ian Goodfellow & Sandy Huang

Adversarial Attacks Against Reinforcement Learning Agents with Ian Goodfellow & Sandy Huang

In this episode, I’m joined by Ian Goodfellow, Staff Research Scientist at Google Brain and Sandy Huang, Phd Student in the EECS department at UC Berkeley, to discuss their work on the paper Adversari...

15 Maalis 201847min

Towards Abstract Robotic Understanding with Raja Chatila - TWiML Talk #118

Towards Abstract Robotic Understanding with Raja Chatila - TWiML Talk #118

In this episode, we're joined by Raja Chatila, director of Intelligent Systems and Robotics at Pierre and Marie Curie University in Paris, and executive committee chair of the IEEE global initiative o...

12 Maalis 201847min

Discovering Exoplanets w/ Deep Learning with Chris Shallue - TWiML Talk #117

Discovering Exoplanets w/ Deep Learning with Chris Shallue - TWiML Talk #117

Earlier this week, I had a chance to speak with Chris Shallue, Senior Software Engineer on the Google Brain Team, about his project and paper on “Exploring Exoplanets with Deep Learning.” This is a gr...

8 Maalis 201845min

Suosittua kategoriassa Politiikka ja uutiset

aikalisa
rss-ootsa-kuullut-tasta
tervo-halme
ootsa-kuullut-tasta-2
politiikan-puskaradio
viisupodi
et-sa-noin-voi-sanoo-esittaa
otetaan-yhdet
rss-vaalirankkurit-podcast
rss-asiastudio
rss-podme-livebox
the-ulkopolitist
rss-kaikki-uusiksi
rss-tekkipodi
io-techin-tekniikkapodcast
rikosmyytit
rss-mina-ukkola
rss-fingo-podcast
rss-hyvaa-huomenta-bryssel
rss-kuka-mina-olen