BrowseComp vs The Bots that Bluff

BrowseComp vs The Bots that Bluff

Can AI actually read the internet, or is it just faking it with confidence? In this high-voltage episode, host Emily Laird cracks open BrowseComp, OpenAI’s benchmark built to test whether web-browsing agents can find facts that are hard to uncover but easy to verify. Humans had two hours per question and still bailed most of the time, so what does it mean when a model claims victory? From compute budgets and canary strings to the rise of multimodal chaos, Emily exposes the difference between sounding right and being right, and why in an era of polished, source-backed answers, persistence beats plausible every time. Join the AI Weekly Meetups Connect with Us: If you enjoyed this episode or have questions, reach out to Emily Laird on LinkedIn. Stay tuned for more insights into the evolving world of generative AI. And remember, you now know more about the BrowseComp benchmark.

Connect with Emily Laird on LinkedIn

Tämä jakso on lisätty Podme-palveluun avoimen RSS-syötteen kautta eikä se ole Podmen omaa tuotantoa. Siksi jakso saattaa sisältää mainontaa.

Jaksot(291)

Alexa, Fire Middle Management: The Amazon Layoffs

Alexa, Fire Middle Management: The Amazon Layoffs

Last week, Amazon axed 14,000 white-collar jobs, and this time, the pink slip came with a side of machine learning. In this episode, Emily Laird is unpacking the biggest corporate AI bloodbath of 2025...

4 Marras 20258min

GPU vs TPU

GPU vs TPU

What do Call of Duty, Google Cloud, and your favorite cat video have in common? They all owe their lives to a battle raging deep inside your devices: GPU versus TPU. In this episode, host Emily Laird ...

3 Marras 202511min

Meta Firings: The Great FAIR Purge and Rise of the AI Death Star

Meta Firings: The Great FAIR Purge and Rise of the AI Death Star

Meta just threw its AI playbook in the shredder, lit the ashes on fire, and built a secret lab on top. In this episode, Emily Laird breaks down the end of FAIR, the open-source darling of Meta AI, and...

28 Loka 20259min

Domo Arigato, Laundry-Bot-o: Meet Figure 03

Domo Arigato, Laundry-Bot-o: Meet Figure 03

Humanoid robots are no longer sci-fi fever dreams or Silicon Valley party tricks, they’re coming for your chores (at least we hope). In this episode, Emily Laird's getting hands-on (literally) with Fi...

27 Loka 20258min

Tilly Norwood: Artificially Famous

Tilly Norwood: Artificially Famous

Tilly Norwood isn’t real but she is raising real eyebrows. Billed as the next Natalie Portman (if Natalie were a laggy IKEA algorithm), this AI “actress” is part influencer, part software stack, and p...

16 Loka 20259min

Swamp Thing: Musk’s AI Beast Rises in Memphis

Swamp Thing: Musk’s AI Beast Rises in Memphis

Elon Musk isn’t just tweeting through it, he’s building a supercomputer the size of a football stadium in the swamps of Memphis. It's called Colossus, and it’s stuffed with hundreds of thousands of Nv...

15 Loka 20259min

Janitor AI: Chatbots Gone Wild (Seriously)

Janitor AI: Chatbots Gone Wild (Seriously)

A continuation of Emily's exploration into the Andreeson Horowitz Top 100 Generative AI apps. Janitor AI is a full-blown fever dream with 5 million users, 2 million characters, and zero shame. In thi...

14 Loka 202513min

OpenAI Dev Day 2025: The Recap

OpenAI Dev Day 2025: The Recap

OpenAI Dev Day 2025 wasn’t just a product drop, it was a full-on software coup. In this episode, host Emily Laird breaks down how ChatGPT graduated from chatbot to operating system, complete with apps...

13 Loka 20258min