AI Security Podcast6 Nov 2025

Inside the 29.5 Million DARPA AI Cyber Challenge: How Autonomous Agents Find & Patch Vulns

What does it take to build a fully autonomous AI system that can find, verify, and patch vulnerabilities in open-source software? Michael Brown, Principal Security Engineer at Trail of Bits, joins us to go behind the scenes of the 3-year DARPA AI Cyber Challenge (AICC), where his team's agent, "Buttercup," won second place.

Michael, a self-proclaimed "AI skeptic," shares his surprise at how capable LLMs were at generating high-quality patches . However, he also shared the most critical lesson from the competition: "AI was actually the commodity" The real differentiator wasn't the AI model itself, but the "best of both worlds" approach, robust engineering, intelligent scaffolding, and using "AI where it's useful and conventional stuff where it's useful" .

This is a great listen for any engineering or security team building AI solutions. We cover the multi-agent architecture of Buttercup, the real-world costs and the open-source future of this technology .

Questions asked:

(00:00) Introduction: The DARPA AI Hacking Challenge(03:00) Who is Michael Brown? (Trail of Bits AI/ML Research)(04:00) What is the DARPA AI Cyber Challenge (AICC)?(04:45) Why did the AICC take 3 years to run?(07:00) The AICC Finals: Trail of Bits takes 2nd place(07:45) The AICC Goal: Autonomously find AND patch open source(10:45) Competition Rules: No "virtual patching"(11:40) AICC Scoring: Finding vs. Patching(14:00) The competition was fully autonomous(14:40) The 3-month sprint to build Buttercup v1(15:45) The origin of the name "Buttercup" (The Princess Bride)(17:40) The original (and scrapped) concept for Buttercup(20:15) The critical difference: Finding vs. Verifying a vulnerability(26:30) LLMs were allowed, but were they the key?(28:10) Choosing LLMs: Using OpenAI for patching, Anthropic for fuzzing(30:30) What was the biggest surprise? (An AI skeptic is blown away)(32:45) Why the latest models weren't always better(35:30) The #1 lesson: The importance of high-quality engineering(39:10) Scaffolding vs. AI: What really won the competition?(40:30) Key Insight: AI was the commodity, engineering was the differentiator(41:40) The "Best of Both Worlds" approach (AI + conventional tools)(43:20) Pro Tip: Don't ask AI to "boil the ocean"(45:00) Buttercup's multi-agent architecture (Engineer, Security, QA)(47:30) Can you use Buttercup for your enterprise? (The $100k+ cost)(48:50) Buttercup is open source and runs on a laptop(51:30) The future of Buttercup: Connecting to OSS-Fuzz(52:45) How Buttercup compares to commercial tools (RunSybil, XBOW)(53:50) How the 1st place team (Team Atlanta) won(56:20) Where to find Michael Brown & Buttercup

Resources discussed during the interview:

Trail of Bits

Denne episoden er hentet fra en åpen RSS-feed og er ikke publisert av Podme. Den kan derfor inneholde annonser.

Episoder(55)

Why Asset Intelligence is Replacing the CMDB & Static Dashboards

Why do CISOs still struggle with asset intelligence in 2026? Despite decades of security tooling, most organizations still have a massive 40% "dark matter" blind spot in their environment and the expl...

11 Jun 42min

The AI AuthZ Problem: Why Human Least Privilege Fails for Autonomous Agents

Why are security leaders terrified of connecting AI agents to production data? Because unlike humans, AI agents don't apply judgment, and they operate at machine speed, meaning they can relentlessly h...

4 Jun 47min

Securing AI at the Speed of Engineering | DoorDash | Forward Deployed Security | GRC Engineering

Is your security team moving at the speed of your engineering team? In this special live recording of the AI Security Podcast from San Francisco, Ashish is joined by Nick Reva (Global Director, Engine...

21 Mai 1h 3min

Verification vs. Validation: How Autonomous AI is Changing Cybersecurity

Are autonomous AI agents operating unchecked in your enterprise? With the release of open source frameworks like OpenClaw, deploying an AI agent is now as simple as texting, but it comes with massive,...

13 Mai 1h 10min

The Zero-Click AI Hack: How to Contain the Blast Radius of Autonomous Agents

Is an AI agent's identity a workload or an action? Ashish spoke to Elie Bursztein, Distinguished Research Scientist and co-author of Google SAIF (Secure AI Framework) about how it is neither and that ...

29 Apr 47min

Buy vs. Build AI Security: Why [Box.com](http://Box.com) CISO is Creating their Own Agentic SOC

If your AI solution is just helping humans process the same amount of alerts a little faster, you haven't transformed anything, you've just created a faster hamster wheel.In this episode, Ashish and C...

22 Apr 46min

Anthropic's Project Mythos: Why the "Zero-Day Machine" is Terrifying the Security Industry

In this episode, Ashish and Caleb discuss the internet-breaking preview of Project Mythos, an unreleased AI model from Anthropic that has shown an unprecedented, terrifying ability to reason through c...

18 Apr 1h 3min

Are AI Security Startups Faking It? How to Separate Signal from Noise

With over 70 startups claiming to have built the perfect "AI SOC Analyst" or "AI Threat Hunter," how do you separate the real products from the vaporware? Recorded live at Decibel RSAC Founder Festiva...

15 Apr 47min