Varsity A/B Testing

Varsity A/B Testing

When you want to understand if doing something causes something else to happen, like if a change to a website causes and dip or rise in downstream conversions, the gold standard analysis method is to use randomized controlled trials. Once you’ve properly randomized the treatment and effect, the analysis methods are well-understood and there are great tools in R and python (and other languages) to find the effects. However, when you’re operating at scale, the logistics of running all those tests, and reaching correct conclusions reliably, becomes the main challenge—making sure the right metrics are being computed, you know when to stop an experiment, you minimize the chances of finding spurious results, and many other issues that are simple to track for one or two experiments but become real challenges for dozens or hundreds of experiments. Nonetheless, the reality is that there might be dozens or hundreds of experiments worth running. So in this episode, we’ll work through some of the most important issues for running experiments at scale, with strong support from a series of great blog posts from Airbnb about how they solve this very issue. For some blog post links relevant to this episode, visit lineardigressions.com

Det här avsnittet är hämtat från ett öppet RSS-flöde och publiceras inte av Podme. Det kan innehålla reklam.

Avsnitt(309)

How Do You Evaluate An AI Agent? (The Agents Season, Episode 7)

How Do You Evaluate An AI Agent? (The Agents Season, Episode 7)

Knowing when an AI agent has failed sounds straightforward — until it isn't. Agents have a frustrating habit of finishing confidently while quietly doing the wrong thing, or looping endlessly without ...

1 Juni 31min

AI Agent Failure Modes (The Agents Season, Episode 6)

AI Agent Failure Modes (The Agents Season, Episode 6)

Despite what the marketing hype might suggest, AI agents are far from infallible — and if you've ever actually used one, you already know this. Today's episode dives deep into the many, varied, and so...

25 Maj 32min

Agentic Planning (The Agents Season, Episode 5)

Agentic Planning (The Agents Season, Episode 5)

When tackling a complex, multi-step task, even the smartest AI agent can fail without a solid game plan. This episode dives into the research around agentic planning — how agents move beyond simply re...

18 Maj 24min

Memory Management for AI Agents (The Agents Season, Episode 4)

Memory Management for AI Agents (The Agents Season, Episode 4)

Context windows are powerful — but finite, and surprisingly easy to overwhelm. When an AI agent is tackling a long, complex task, the information it needs has to fit inside that limited real estate, a...

10 Maj 24min

Lost in the Middle (The Agents Season, Episode 3)

Lost in the Middle (The Agents Season, Episode 3)

Just like a memorable talk lives or dies by its opening and closing, LLMs have a surprisingly similar quirk: they pay close attention to what's at the beginning and end of their context window — and k...

4 Maj 19min

ReAct and Tool Usage (The Agents Season, Episode 2)

ReAct and Tool Usage (The Agents Season, Episode 2)

Before 2022, there was a wall between AI and the real world — models could reason impressively, but couldn't look anything up, run code, or check whether anything they said was actually true. This epi...

27 Apr 23min

What's an AI Agent? And Why's That Hard to Define? (The Agents Season, Episode 1)

What's an AI Agent? And Why's That Hard to Define? (The Agents Season, Episode 1)

AI agents are having a moment — and unpacking them properly takes more than a single conversation. This episode kicks off a dedicated multi-part season exploring AI agents from every angle, building u...

20 Apr 19min

Unfaithful Chain of Thought

Unfaithful Chain of Thought

What's actually happening when an LLM "thinks out loud"? Research on human decision-making suggests that much of the reasoning we believe drives our choices is actually post hoc rationalization — we d...

13 Apr 24min

Populärt inom Teknik

uppgang-och-fall
elbilsveckan
market-makers
rss-elektrikerpodden
bilar-med-sladd
rss-laddstationen-med-elbilen-i-sverige
developers-mer-an-bara-kod
rss-veckans-ai
bli-saker-podden
rss-technokratin
natets-morka-sida
skogsforum-podcast
bosse-bildoktorn-och-hasse-p
har-vi-akt-till-mars-an
ai-sweden-podcast
hej-bruksbil
rss-upplyst-entreprenordirektor
rss-en-ai-till-kaffet
rss-snacka-om-ai
rss-hit-med-dina-lunchpengar