Generative Benchmarking with Kelly Hong - #728
In this episode, Kelly Hong, a researcher at Chroma, joins us to discuss "Generative Benchmarking," a novel approach to evaluating retrieval systems, like RAG applications, using synthetic data. Kelly explains how traditional benchmarks like MTEB fail to represent real-world query patterns and how embedding models that perform well on public benchmarks often underperform in production. The conversation explores the two-step process of Generative Benchmarking: filtering documents to focus on relevant content and generating queries that mimic actual user behavior. Kelly shares insights from applying this approach to Weights & Biases' technical support bot, revealing how domain-specific evaluation provides more accurate assessments of embedding model performance. We also discuss the importance of aligning LLM judges with human preferences, the impact of chunking strategies on retrieval effectiveness, and how production queries differ from benchmark queries in ambiguity and style. Throughout the episode, Kelly emphasizes the need for systematic evaluation approaches that go beyond "vibe checks" to help developers build more effective RAG applications. The complete show notes for this episode can be found at https://twimlai.com/go/728.

Avsnitt(780)

Building an AI Mathematician with Carina Hong - #754

Building an AI Mathematician with Carina Hong - #754

In this episode, Carina Hong, founder and CEO of Axiom, joins us to discuss her work building an "AI Mathematician." Carina explains why this is a pivotal moment for AI in mathematics, citing a conver...

4 Nov 202555min

High-Efficiency Diffusion Models for On-Device Image Generation and Editing with Hung Bui - #753

High-Efficiency Diffusion Models for On-Device Image Generation and Editing with Hung Bui - #753

In this episode, Hung Bui, Technology Vice President at Qualcomm, joins us to explore the latest high-efficiency techniques for running generative AI, particularly diffusion models, on-device. We dive...

28 Okt 202552min

Vibe Coding's Uncanny Valley with Alexandre Pesant - #752

Vibe Coding's Uncanny Valley with Alexandre Pesant - #752

Today, we're joined by Alexandre Pesant, AI lead at Lovable, who joins us to discuss the evolution and practice of vibe coding. Alex shares his take on how AI is enabling a shift in software developme...

22 Okt 20251h 12min

Dataflow Computing for AI Inference with Kunle Olukotun - #751

Dataflow Computing for AI Inference with Kunle Olukotun - #751

In this episode, we're joined by Kunle Olukotun, professor of electrical engineering and computer science at Stanford University and co-founder and chief technologist at Sambanova Systems, to discuss ...

14 Okt 202557min

Recurrence and Attention for Long-Context Transformers with Jacob Buckman - #750

Recurrence and Attention for Long-Context Transformers with Jacob Buckman - #750

Today, we're joined by Jacob Buckman, co-founder and CEO of Manifest AI to discuss achieving long context in transformers. We discuss the bottlenecks of scaling context length and recent techniques to...

7 Okt 202557min

The Decentralized Future of Private AI with Illia Polosukhin - #749

The Decentralized Future of Private AI with Illia Polosukhin - #749

In this episode, Illia Polosukhin, a co-author of the seminal "Attention Is All You Need" paper and co-founder of Near AI, joins us to discuss his vision for building private, decentralized, and user-...

30 Sep 20251h 5min

Inside Nano Banana 🍌 and the Future of Vision-Language Models with Oliver Wang - #748

Inside Nano Banana 🍌 and the Future of Vision-Language Models with Oliver Wang - #748

Today, we’re joined by Oliver Wang, principal scientist at Google DeepMind and tech lead for Gemini 2.5 Flash Image—better known by its code name, “Nano Banana.” We dive into the development and capab...

23 Sep 20251h 3min

Is It Time to Rethink LLM Pre-Training? with Aditi Raghunathan - #747

Is It Time to Rethink LLM Pre-Training? with Aditi Raghunathan - #747

Today, we're joined by Aditi Raghunathan, assistant professor at Carnegie Mellon University, to discuss the limitations of LLMs and how we can build more adaptable and creative models. We dig into her...

16 Sep 202558min

Populärt inom Politik & nyheter

aftonbladet-krim
svenska-fall
rss-krimstad
p3-krim
fordomspodden
flashback-forever
motiv
blenda-2
rss-sanning-konsekvens
aftonbladet-daily
rss-krimreportrarna
svd-ledarredaktionen
rss-vad-fan-hande
olyckan-inifran
rss-frandfors-horna
spar
rss-flodet
politiken
dagens-eko
krimmagasinet