Klaviyo Data Science Podcast7 Feb 2025

Klaviyo Data Science Podcast EP 56 | Evaluating AI Models: A Seminar (feat. Evan Miller)

This month, the Klaviyo Data Science Podcast welcomes Evan Miller to deliver a seminar on his recently published paper, Adding Error Bars to Evals: A Statistical Approach to Language Model Evaluations! This episode is a mix of a live seminar Evan gave to the team at Klaviyo and an interview we conducted with him afterward.

Suppose you’re trying to understand the performance of an AI model — maybe one you built or fine-tuned and are comparing to state-of-the-art models, maybe one you’re considering loading up and using for a project you’re about to start. If you look at the literature today, you can get a sense of what the average performance for the model is on an evaluation or set of tasks. But often, that’s unfortunately the extent of what it’s possible to learn —there is much less emphasis placed on the variability or uncertainty inherent to those estimates. And as anyone who’s worked with a statistical model in the past can affirm, variability is a huge part of why you might choose to use or discard a model.

This seminar explores how to best compute, summarize, and display estimates of variability for AI models. Listen along to hear about topics like:

Why the Central Limit Theorem you learned about in Stats 101 is still relevant with the most advanced AI models developed today
How to think about complications of classic assumptions, such as measurement error or clustering, in the AI landscape
When to do a sample size calculation for your AI model, and how to do it

About Evan Miller

You may already know our guest Evan Miller from his fantastic blog, which includes his celebrated A/B testing posts, such as “How not to run an A/B test.” You may also have used his A/B testing tools, such as the sample size calculator. Evan currently works as a research scientist at Anthropic.

About Anthropic

Per Anthropic’s website:

You can find more information about Anthropic, including links to their social media accounts, on the company website.

Anthropic is an AI safety and research company based in San Francisco. Our interdisciplinary team has experience across ML, physics, policy, and product. Together, we generate research and create reliable, beneficial AI systems.

Special thanks to Chris Murphy at Klaviyo for organizing this seminar and making this episode possible!

For the full show notes, including who's who, see the ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Medium writeup⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠.

Denne episoden er hentet fra en åpen RSS-feed og er ikke publisert av Podme. Den kan derfor inneholde annonser.

Episoder(62)

Klaviyo Data Science Podcast EP 62 | The Math of Games (Part 1)

Games typically involve some combination of math, chance, strategy, and skill. In this two part mini-series, the Klaviyo Data Science Podcast investigates the role of math in games and how our relatio...

4 Aug 202545min

Klaviyo Data Science Podcast EP 61 | The Tech Startup Bildungsroman

All companies start small. The lucky ones grow, and growth necessarily comes with change. This month on the Klaviyo Data Science Podcast, we look at Klaviyo’s growth through the past 8 years and the p...

9 Jul 202545min

Klaviyo Data Science Podcast EP 60 | Books Every Data Scientist (and Software Engineer) Should Read (vol. 3)

This month, we return to a classic Klaviyo Data Science Podcast series: books every data scientist (and software engineer) should read. This episode focuses on the Clean * duology by Robert C. Martin,...

11 Jun 202542min

Klaviyo Data Science Podcast EP 59 | Next Best Action

What should I do next? A common question, one that seems simple on the surface, but the answer, especially a more optimal answer, can be very difficult to uncover. It may involve information that the ...

16 Mai 202523min

Klaviyo Data Science Podcast EP 58 | All Aboard the Leadership

All successful teams have at least one leader, and most have at least one manager. This episode, we dive into how leadership works on highly technical teams, how managing a highly technical team works...

14 Apr 202548min

Klaviyo Data Science Podcast EP 57 | Agile, or, Don't Go Chasing Waterfall

What is agile methodology — and, just as importantly, what is it not? Whether you’re new to agile entirely or you stay up late pondering its most philosophical inner workings, if you want to know more...

11 Mar 202544min

Klaviyo Data Science Podcast EP 55 | 2024 Year in Review

Welcome back to the Klaviyo Data Science podcast! This episode, we dive into… 2024 Year in Review As the new year starts, we take a look back at 2024. We spoke to data scientists and people who work c...

13 Jan 202549min