EA Forum Podcast (Curated & popular)2 Feb

[Linkpost] “Is there a Half-Life for the Success Rates of AI Agents?” by Toby_Ord

This is a link post.

Building on the recent empirical work of Kwa et al. (2025), I show that within their suite of research-engineering tasks the performance of AI agents on longer-duration tasks can be explained by an extremely simple mathematical model — a constant rate of failing during each minute a human would take to do the task. This implies an exponentially declining success rate with the length of the task and that each agent could be characterised by its own half-life. This empirical regularity allows us to estimate the success rate for an agent at different task lengths. And the fact that this model is a good fit for the data is suggestive of the underlying causes of failure on longer tasks — that they involve increasingly large sets of subtasks where failing any one fails the task. Whether this model applies more generally on other suites of tasks is unknown and an important subject for further work.

METR's results on the length of tasks agents can reliably complete

A recent paper by Kwa et al. (2025) from the research organisation METR has found an exponential trend in the duration of the tasks that frontier AI agents can [...]

---

Outline:

(05:33) Explaining these results via a constant hazard rate

(14:54) Upshots of the constant hazard rate model

(18:47) Further work

(19:25) References

---

First published:
February 2nd, 2026

Source:
https://forum.effectivealtruism.org/posts/qz3xyqCeriFHeTAJs/is-there-a-half-life-for-the-success-rates-of-ai-agents-3

Linkpost URL:
https://www.tobyord.com/writing/half-life

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Denne episoden er hentet fra en åpen RSS-feed og er ikke publisert av Podme. Den kan derfor inneholde annonser.

Episoder(250)

“Let’s taboo the V-word” by lincolnq

“How long have you been v*g*n?” This is one of the most common icebreakers at animal protection events. It's a baseline assumption, and it mostly holds true: if you’re out advocating for animals not t...

14 Jul 12min

“Giving What We Can’s first YouTube Video is out now!” by JustinPortela

Hello! I'm Justin Portela. I got hired by GWWC to make YouTube videos after AI in Context did such a kickass job. My channel is using that same cinematic, high-production value beauty to talk about e...

9 Jul 1min

“I’m never satisfied” by Ajeya

Note: This post was crossposted from Planned Obsolescence by the Forum team, with the author's permission. The author may not see or respond to comments on this post. But we get the job done I was twe...

8 Jul 6min

“Maybe do the thing you wish CEA would do” by alejoacelas 🔸

I used AI to fix transcription errors, rerrarange the ideas, and suggest tweaks to the title and some sentences. Three of the most exciting projects to come out of EA in recent years are, in a vague s...

8 Jul 4min

“Mabye do the thing you wish CEA would do” by AlejoAcelas🔸

8 Jul 4min

“Possible mistake EAs are making and shout out to Pause AI UK” by Michelle_Hutchinson

I think right now EAs might be making a significant mistake by paying insufficient attention to the political realm. As EAs we tend to figure out what's most impactful for us to work on and focus hard...

29 Jun 6min

“Coming Around To Political Donations” by Jeff Kaufman 🔸

Five years ago I read a post on the EA Forum arguing that "election campaign contributions might be a way in which you can have a substantial impact as a small donor". It struck me as weird but plausi...

12 Jun 4min

“animal welfare has an evidence problem” by matthes

Why I stopped donating to animal welfare charities but feel more motivated than ever to redirect money and talent to the cause. I have wanted to write this post for a while. It is an uncomfortable thi...

6 Jun 26min