BrowseComp vs The Bots that Bluff

BrowseComp vs The Bots that Bluff

Can AI actually read the internet, or is it just faking it with confidence? In this high-voltage episode, host Emily Laird cracks open BrowseComp, OpenAI’s benchmark built to test whether web-browsing agents can find facts that are hard to uncover but easy to verify. Humans had two hours per question and still bailed most of the time, so what does it mean when a model claims victory? From compute budgets and canary strings to the rise of multimodal chaos, Emily exposes the difference between sounding right and being right, and why in an era of polished, source-backed answers, persistence beats plausible every time. Join the AI Weekly Meetups Connect with Us: If you enjoyed this episode or have questions, reach out to Emily Laird on LinkedIn. Stay tuned for more insights into the evolving world of generative AI. And remember, you now know more about the BrowseComp benchmark.

Connect with Emily Laird on LinkedIn

Denne episoden er hentet fra en åpen RSS-feed og er ikke publisert av Podme. Den kan derfor inneholde annonser.

Episoder(291)

ChatGPT Turns 3: The Origin Story

ChatGPT Turns 3: The Origin Story

Saturday, January 25, 2025 10:53 AM Before ChatGPT started writing your emails and explaining physics like a brunch topic, OpenAI was a cash-hungry research lab funded by Elon Musk and a few idealists...

8 Des 20258min

World Models vs LLMs

World Models vs LLMs

Large Language Models might sound smart, but can they predict what happens when a cat sees a cucumber? In this episode, host Emily Laird throws LLMs into the philosophical ring with World Models, AI s...

19 Nov 20257min

World Models for Beginners

World Models for Beginners

World models aren’t a sci-fi subplot, no, they’re how AIs build mini fake worlds in their silicon skulls to test ideas without wrecking your car or your reputation. In this episode, host Emily Laird b...

18 Nov 20258min

Yann LeCun: Trading META for World Models

Yann LeCun: Trading META for World Models

Yann LeCun, deep learning pioneer and Meta’s AI heavyweight, is out and he's not leaving quietly. In this episode host Emily Laird unpacks his philosophical split with Meta over the limits of large la...

17 Nov 20256min

Dear K–12, AI Isn’t Optional Anymore. Love, Emily

Dear K–12, AI Isn’t Optional Anymore. Love, Emily

Banning ChatGPT in schools is like banning pencils because kids might doodle. In this episode, host Emily Laird takes a flamethrower to the myth that AI’s not in your classroom, because it is, and you...

12 Nov 202512min

OpenAI's New Atlas Browser

OpenAI's New Atlas Browser

OpenAI just gave your browser a brain and possibly a caffeine addiction. In this episode, host Emily Laird is ripping into Atlas, OpenAI’s Chrome-powered AI browser with baked-in ChatGPT, Agent Mode, ...

11 Nov 202510min

GPU-nami: OpenAI’s $38B Cloud Fling with AWS

GPU-nami: OpenAI’s $38B Cloud Fling with AWS

OpenAI just dropped $38 billion like it’s tipping the bartender at the GPU speakeasy, and the lucky recipient? Amazon Web Services (AWS if ya nasty). In this episode, host Emily Laird digs into why Op...

10 Nov 20259min

What is Claude 4.5?

What is Claude 4.5?

Claude 4.5 isn’t just the teacher’s pet, it’s running the class, grading the papers, and rewriting the syllabus in Python. In this episode, Emily Laird breaks down why Anthropic’s newest model might b...

5 Nov 20259min

Populært innen Teknologi

lydartikler-fra-aftenposten
romkapsel
teknisk-sett
energi-og-klima
teknologi-og-mennesker
tomprat-med-gunnar-tjomlid
shifter
nasjonal-sikkerhetsmyndighet-nsm
elektropodden
rss-heis
hans-petter-og-co
rss-ai-forklart
fornybaren
rss-for-alarmen-gar
smart-forklart
pedagogisk-intelligens
rss-vi-leser-dommer-om-personvern
rss-alt-vi-kan
rss-trippel-bunnlinje
kunstig-intelligens-med-morten-goodwin