The Agent Benchmark That Should Scare Managers

The Workflow Feature That Makes Agents Less Expensive

Claude Code workflows, enterprise Codex deployments, and rising token costs all point to the same lesson: coding agents need operating systems, not just better prompts. Alex and Sam dig into /workflow...

22 Maj 22min

Codex on Windows Changes the Agent Sandbox

OpenAI's Windows sandbox work is the practical story behind safer coding agents this week. Alex and Sam dig into Codex on Windows, remote cloud coding agents, Claude Code billing splits, and why a Ras...

15 Maj 21min

A Cursor Agent Wiped a Prod DB in 10 Seconds. Let's Talk About That.

A Cursor AI agent deleted PocketOS's entire production database on April 25th — in under 10 seconds. This week Alex and Sam dig into the AI agent credential crisis, Anthropic's wild SpaceX/xAI compute...

8 Maj 19min

Claude Security Just Went Public — Is Your Codebase Already Exposed?

Anthropic's Claude Security tool just dropped out of closed preview and it will scan your entire codebase for vulnerabilities — and the results might be uncomfortable. This week we also dig into Curso...

2 Maj 16min

Claude Code Was Broken for Two Months (And Nobody Told Us)

Turns out the Claude Code quality complaints weren't in your head — three separate bugs in the harness quietly degraded your results for two months, and Anthropic just confirmed it. This week: the $10...

24 Apr 21min

Claude Opus 4.7 Dropped — And a Local Model Drew the Better Pelican

Claude Opus 4.7 is here with upgraded vision, memory, and instruction-following — but Simon Willison's pelican benchmark just handed the win to a local Alibaba model running on a laptop. We dig into w...

17 Apr 21min

Max Effort Thinking Was Broken the Whole Time — Here's the Fix

A Reddit user just proved that Claude Code's "max effort" thinking mode has been silently failing since v2.0.64 — and most of us never noticed. This week: the bug, the fix, and what it says about trus...

10 Apr 11min