The Apparent Meaninglessness of AI Benchmarks, plus How to Explain AI Opportunities to Others

The Apparent Meaninglessness of AI Benchmarks, plus How to Explain AI Opportunities to Others

Every week brings a new AI benchmark. Higher scores. Bigger claims. Louder voices insisting this changes everything. And yet, when you put AI in front of a real business problem, none of that noise seems to help. In this episode, Rob and Justin dig into why AI benchmarks often feel strangely meaningless in practice and why that disconnect is the point. Benchmarks aren't useless. They're just answering a different question than the one most businesses are asking.

This isn't just random conjecture either. Rob walks through what he's learned building actual AI workflows and why a twenty percent improvement on a leaderboard rarely translates into anything you can feel on the job. They talk about why model choice usually isn't the bottleneck, why swapping models should be easy if you've built things the right way, and why the most successful AI work rarely shows up as a flashy demo. Most of the value is happening quietly, off-screen, inside systems that look a lot more like normal software than artificial intelligence.

Rob and Justin also talk about why explaining AI is often harder than building it. The first demo people see tends to stick, even when it's the wrong one. Consumer AI feels magical. Business AI face plants unless it's built with intent, structure, and real context. This episode gives leaders better language for that gap, without hype or panic. If you're done chasing benchmarks and just want a way to think about AI that survives contact with reality, this episode's for you.

Denne episoden er hentet fra en åpen RSS-feed og er ikke publisert av Podme. Den kan derfor inneholde annonser.

Episoder(237)

The End of All You Can Eat AI

The End of All You Can Eat AI

For about two years, we've all been reaching for the biggest hammer on the wall because someone else was paying for the nails. If you were on a subscription, you grabbed the biggest, baddest model on ...

30 Jun 24min

Waiting Worked...Until AI

Waiting Worked...Until AI

For years, Rob had a pretty good system. When a new technology showed up, he didn't immediately declare it the next big thing. He wanted to understand why it mattered first. Sometimes that meant jumpi...

23 Jun 15min

Why Bigger AI Isn't Always Better

Why Bigger AI Isn't Always Better

Microsoft just unveiled a monster of a machine built for local AI. More memory. More horsepower. More everything. Which led Rob and Justin to a question that has almost nothing to do with the hardware...

16 Jun 29min

What Happens After the AI Works?

What Happens After the AI Works?

For the past few years, the conversation around AI has focused on the technology. Which model is best. Which tools to use. How fast everything is changing. But once you start building with it, a diffe...

9 Jun 35min

Absences, KPI Updates, Book Title and Pre-Order Bundle Reveal

Absences, KPI Updates, Book Title and Pre-Order Bundle Reveal

If you've listened to the podcast over the past several months, you've probably heard Rob mention "the book" a few times. Well, it's finally done. In this solo episode, Rob reveals the title, shares t...

2 Jun 6min

It's Time to Start Looking Into Microsoft IQ

It's Time to Start Looking Into Microsoft IQ

Rob was supposed to be finishing his book. Last chapter. Two days past deadline. Freedom was right there. Instead, he hit pause and recorded this. Because something from a few weeks ago wouldn't leave...

5 Mai 19min

Cowork Builds Apps Now, and 'Acquired Skills Will Appear Here' w/ Garett Medlin

Cowork Builds Apps Now, and 'Acquired Skills Will Appear Here' w/ Garett Medlin

Garett Medlin just got the official title for the job he was already doing: AI Practice Lead at P3. He's also the person responsible for Rob trying Cowork in the first place, despite Rob's very reason...

28 Apr 56min

AI "versus" the Medical Establishment, Rob's Sith Name, and the Death of Social Media?

AI "versus" the Medical Establishment, Rob's Sith Name, and the Death of Social Media?

Rob didn't go looking for a fight with the medical system. He just showed up with receipts. Claude had already mapped the symptoms, suggested the tests, and summarized the situation better than any po...

21 Apr 30min

Populært innen Business og økonomi

stopp-verden
lydartikler-fra-aftenposten
dine-penger-pengeradet
rss-penger-polser-og-politikk
e24-podden
rss-borsmorgen-okonominyhetene
rss-skravla-gar
aftenbladet-intervjuer
pengepodden-2
rss-pa-konto
finansredaksjonen
livet-pa-veien-med-jan-erik-larssen
tid-er-penger-en-podcast-med-peter-warren
morgenkaffen-med-finansavisen
utbytte
okonomiamatorene
liberal-halvtime
lederpodden
pengesnakk
rss-politisk-preik