Understanding the Most Viral Chart in Artificial Intelligence
Odd Lots25 Apr

Understanding the Most Viral Chart in Artificial Intelligence

We live in an era of charts that are going up and to the right. This image obviously describes the stock market, particularly any company whose business is adjacent to artificial intelligence. But beyond stocks, another sort of chart we keep seeing is of AI capabilities also going up and to the right. The most famous and viral of these comes from an organization called METR, which stands for Model Evaluation and Threat Research. The organization is focused on understanding the degree to which AI models can engage in autonomous, complex tasks. METR see this is as a particularly important benchmark, given the risk that AI could one day be engaged in recursive self improvement, taking humans out of the loop. But how do you really gauge a model's ability to do complex problems. And what is being measured for exactly? On this episode, we speak with METR's President Chris Painter as well as Joel Becker, a member of the technical staff who works on evaluation methods for the organization. We discuss both the mechanics and the philosophy of METR's work, and what it means when we see a a chart showing that Clause Opus 4.6 can do a task that would take a human nearly 12 hours.

Read more:
DeepSeek Unveils Flagship AI Model a Year After Breakthrough
Meta Inks Deal to Use Amazon’s Graviton Processors for AI

Only http://Bloomberg.com subscribers can get the Odd Lots newsletter in their inbox each week, plus unlimited access to the site and app. Subscribe at bloomberg.com/subscriptions/oddlots

Subscribe to the Odd Lots Newsletter
Join the conversation: discord.gg/oddlots

See omnystudio.com/listener for privacy information.

Episoder(1200)

Inside the Booming Market for Dinosaur Fossils

Inside the Booming Market for Dinosaur Fossils

Two years ago, Citadel's Ken Griffin paid almost $45 million for a stegosaurus skeleton, making it the most expensive fossil ever sold at auction. So why are dinosaur bones joining the collections of ...

2 Mai 48min

How Taiwan Became the World's Most Perilous Geopolitical Chokepoint

How Taiwan Became the World's Most Perilous Geopolitical Chokepoint

The closure of the Strait of Hormuz has highlighted the potential for long-running theoretical chokepoints to turn into reality, with dramatic results for both geopolitics and the global economy. But ...

1 Mai 56min

BlackRock's Rob Goldstein on the Next Megatrends in Finance

BlackRock's Rob Goldstein on the Next Megatrends in Finance

The last few decades have been marked by a number of megatrends in finance including the extraordinary growth of asset managers, the rising importance of technology, and the ascent of private markets....

30 Apr 56min

What's Actually Going On With Private Credit

What's Actually Going On With Private Credit

The private credit market has grown enormously fast in recent years — so much so that by some estimates it's now bigger than the market for junk-rated corporate bonds. So what's driven all that growth...

27 Apr 50min

Presenting Foundering Season 6: The Killing of Bob Lee,  Part 1

Presenting Foundering Season 6: The Killing of Bob Lee, Part 1

The Killing of Bob Lee, Part 1: San Francisco Has Blood On Its Hands Three years ago, Bob Lee, a tech executive famous for creating Cash App, was found stabbed in San Francisco. His killing set off a ...

26 Apr 37min

James Bosworth on the "Orange Wave" Happening Across Latin America

James Bosworth on the "Orange Wave" Happening Across Latin America

We're living in an extraordinary moment for Latin American politics. From the ousting of Maduro to the ongoing oil blockade of Cuba to Javier Milei revving up a chainsaw at CPAC. Various leaders in di...

24 Apr 50min

Google's Liz Reid on Who Will Own Search in a World of AI

Google's Liz Reid on Who Will Own Search in a World of AI

Not too long ago, search engines were the dominant form of querying the internet. But that's changing since the rise of large language models like ChatGPT, Claude, and Google's Gemini. More and more p...

23 Apr 51min

Populært innen Business og økonomi

stopp-verden
lydartikler-fra-aftenposten
dine-penger-pengeradet
e24-podden
rss-penger-polser-og-politikk
rss-borsmorgen-okonominyhetene
rss-pa-konto
pengesnakk
pengepodden-2
utbytte
finansredaksjonen
morgenkaffen-med-finansavisen
liberal-halvtime
livet-pa-veien-med-jan-erik-larssen
tid-er-penger-en-podcast-med-peter-warren
stormkast-med-valebrokk-stordalen
rss-sunn-okonomi
rss-skravla-gar
rss-markedspuls-2
lederpodden