#217 – Beth Barnes on the most important graph in AI right now — and the 7-month rule that governs its progress

#217 – Beth Barnes on the most important graph in AI right now — and the 7-month rule that governs its progress

AI models today have a 50% chance of successfully completing a task that would take an expert human one hour. Seven months ago, that number was roughly 30 minutes — and seven months before that, 15 minutes. (See graph.)

These are substantial, multi-step tasks requiring sustained focus: building web applications, conducting machine learning research, or solving complex programming challenges.

Today’s guest, Beth Barnes, is CEO of METR (Model Evaluation & Threat Research) — the leading organisation measuring these capabilities.

Links to learn more, video, highlights, and full transcript: https://80k.info/bb

Beth's team has been timing how long it takes skilled humans to complete projects of varying length, then seeing how AI models perform on the same work. The resulting paper “Measuring AI ability to complete long tasks” made waves by revealing that the planning horizon of AI models was doubling roughly every seven months. It's regarded by many as the most useful AI forecasting work in years.

Beth has found models can already do “meaningful work” improving themselves, and she wouldn’t be surprised if AI models were able to autonomously self-improve as little as two years from now — in fact, “It seems hard to rule out even shorter [timelines]. Is there 1% chance of this happening in six, nine months? Yeah, that seems pretty plausible.”

Beth adds:

The sense I really want to dispel is, “But the experts must be on top of this. The experts would be telling us if it really was time to freak out.” The experts are not on top of this. Inasmuch as there are experts, they are saying that this is a concerning risk. … And to the extent that I am an expert, I am an expert telling you you should freak out.


What did you think of this episode? https://forms.gle/sFuDkoznxBcHPVmX6


Chapters:

  • Cold open (00:00:00)
  • Who is Beth Barnes? (00:01:19)
  • Can we see AI scheming in the chain of thought? (00:01:52)
  • The chain of thought is essential for safety checking (00:08:58)
  • Alignment faking in large language models (00:12:24)
  • We have to test model honesty even before they're used inside AI companies (00:16:48)
  • We have to test models when unruly and unconstrained (00:25:57)
  • Each 7 months models can do tasks twice as long (00:30:40)
  • METR's research finds AIs are solid at AI research already (00:49:33)
  • AI may turn out to be strong at novel and creative research (00:55:53)
  • When can we expect an algorithmic 'intelligence explosion'? (00:59:11)
  • Recursively self-improving AI might even be here in two years — which is alarming (01:05:02)
  • Could evaluations backfire by increasing AI hype and racing? (01:11:36)
  • Governments first ignore new risks, but can overreact once they arrive (01:26:38)
  • Do we need external auditors doing AI safety tests, not just the companies themselves? (01:35:10)
  • A case against safety-focused people working at frontier AI companies (01:48:44)
  • The new, more dire situation has forced changes to METR's strategy (02:02:29)
  • AI companies are being locally reasonable, but globally reckless (02:10:31)
  • Overrated: Interpretability research (02:15:11)
  • Underrated: Developing more narrow AIs (02:17:01)
  • Underrated: Helping humans judge confusing model outputs (02:23:36)
  • Overrated: Major AI companies' contributions to safety research (02:25:52)
  • Could we have a science of translating AI models' nonhuman language or neuralese? (02:29:24)
  • Could we ban using AI to enhance AI, or is that just naive? (02:31:47)
  • Open-weighting models is often good, and Beth has changed her attitude to it (02:37:52)
  • What we can learn about AGI from the nuclear arms race (02:42:25)
  • Infosec is so bad that no models are truly closed-weight models (02:57:24)
  • AI is more like bioweapons because it undermines the leading power (03:02:02)
  • What METR can do best that others can't (03:12:09)
  • What METR isn't doing that other people have to step up and do (03:27:07)
  • What research METR plans to do next (03:32:09)

This episode was originally recorded on February 17, 2025.

Video editing: Luke Monsour and Simon Monsour
Audio engineering: Ben Cordell, Milo McGuire, Simon Monsour, and Dominic Armstrong
Music: Ben Cordell
Transcriptions and web: Katy Moore

Jaksot(324)

Parenting insights from Rob and 8 past guests

Parenting insights from Rob and 8 past guests

With kids very much on the team's mind we thought it would be fun to review some comments about parenting featured on the show over the years, then have hosts Luisa Rodriguez and Rob Wiblin react to t...

8 Marras 20241h 35min

#206 – Anil Seth on the predictive brain and how to study consciousness

#206 – Anil Seth on the predictive brain and how to study consciousness

"In that famous example of the dress, half of the people in the world saw [blue and black], half saw [white and gold]. It turns out there’s individual differences in how brains take into account ambie...

1 Marras 20242h 33min

How much does a vote matter? (Article)

How much does a vote matter? (Article)

If you care about social impact, is voting important? In this piece, Rob investigates the two key things that determine the impact of your vote:The chances of your vote changing an election’s outcome....

28 Loka 202432min

#205 – Sébastien Moro on the most insane things fish can do

#205 – Sébastien Moro on the most insane things fish can do

"You have a tank split in two parts: if the fish gets in the compartment with a red circle, it will receive food, and food will be delivered in the other tank as well. If the fish takes the blue trian...

23 Loka 20243h 11min

#204 – Nate Silver on making sense of SBF, and his biggest critiques of effective altruism

#204 – Nate Silver on making sense of SBF, and his biggest critiques of effective altruism

Rob Wiblin speaks with FiveThirtyEight election forecaster and author Nate Silver about his new book: On the Edge: The Art of Risking Everything.Links to learn more, highlights, video, and full transc...

16 Loka 20241h 57min

#203 – Peter Godfrey-Smith on interfering with wild nature, accepting death, and the origin of complex civilisation

#203 – Peter Godfrey-Smith on interfering with wild nature, accepting death, and the origin of complex civilisation

"In the human case, it would be mistaken to give a kind of hour-by-hour accounting. You know, 'I had +4 level of experience for this hour, then I had -2 for the next hour, and then I had -1' — and you...

3 Loka 20241h 25min

Luisa and Keiran on free will, and the consequences of never feeling enduring guilt or shame

Luisa and Keiran on free will, and the consequences of never feeling enduring guilt or shame

In this episode from our second show, 80k After Hours, Luisa Rodriguez and Keiran Harris chat about the consequences of letting go of enduring guilt, shame, anger, and pride.Links to learn more, highl...

27 Syys 20241h 36min

#202 – Venki Ramakrishnan on the cutting edge of anti-ageing science

#202 – Venki Ramakrishnan on the cutting edge of anti-ageing science

"For every far-out idea that turns out to be true, there were probably hundreds that were simply crackpot ideas. In general, [science] advances building on the knowledge we have, and seeing what the n...

19 Syys 20242h 20min

Suosittua kategoriassa Koulutus

rss-murhan-anatomia
voi-hyvin-meditaatiot-2
rss-narsisti
psykopodiaa-podcast
rss-vapaudu-voimaasi
rss-uskonto-on-tylsaa
psykologia
rss-liian-kuuma-peruna
rss-duodecim-lehti
aamukahvilla
rss-valo-minussa-2
kesken
rss-niinku-asia-on
adhd-podi
koulu-podcast-2
jari-sarasvuo-podcast
rss-xamk-podcast
rss-luonnollinen-synnytys-podcast
rss-laiska-joogi
rss-opi-espanjaa