AIxCC 2025: What It Means For Device Security

Competition shows it is possible to discover and patch vulnerabilities in open-source programs without human aid.

popularity

The DARPA-sponsored Artificial Intelligence Cyber Challenge (AIxCC) culminated at DEF CON 33 this year, marking a moment where autonomous AI systems demonstrated they can both find and patch vulnerabilities at machine speed. Over two years, teams developed Cyber Reasoning Systems (CRS) designed to scan, prove, and fix bugs in open-source programs without human aid. In the final round, seven teams processed 54 million lines of code, successfully patching 43 of 54 synthetic vulnerabilities and uncovering 18 previousl…

Competition shows it is possible to discover and patch vulnerabilities in open-source programs without human aid.

popularity

In first place, Team Atlanta—a collaboration with Georgia Tech, Samsung, KAIST, and POSTECH—deployed Atlantis, a hybrid system blending traditional fuzzers and symbolic execution with language-specific AI assistants. This multistage approach delivered resilience: if one method faltered, another often caught the vulnerability. Trail of Bits placed second with Buttercup, an AI-augmented fuzzer that uses LLMs to craft semantically nuanced test cases over a libFuzzer backbone; it achieved high patch accuracy at optimized cost. Theori, in third, leaned heavily on orchestrated AI agent swarms following reverse engineering workflows, minimizing hallucinations while still delivering good results.

This is where our perspective on TVLA needs to evolve—not just technically, but philosophically.

Contrast this with the status quo: critical web application flaws take a median of 35 days to remediate, and on external hosts roughly 61 days. Across all vulnerabilities, the average fix time has ballooned to 252 days, even as attackers now exploit in about 5 days on average. The engineering effort can range from 6 to 40 hours per vulnerability, translating to hundreds or thousands of dollars in labor.

AIxCC reports significant improvements over these metrics. A single competition task in AIxCC is to either generate a proof of vulnerability, to generate a patch, or to bundle and validate these for the competition infrastructure. DARPA reported the average cost per competition task was just $152, with remediation completed in an average of 45 minutes. During the competition, Trail of Bits spent $39,600 total, equating to $181 per scored point, while Team Atlanta and Theori landed at approximately $263 and $151 per point, respectively. Measured against the industry norm of weeks and thousands of dollars, this represents orders of magnitude improvement in both speed and cost.

Yet AIxCC doesn’t directly translate to firmware-based device security. Firmware often runs directly on bare metal or under an RTOS. Real-world execution depends on peripherals, interrupts, DMA, and IO buses, which are notoriously difficult to emulate accurately. Tools like QEMU, Unicorn, and avatar2 emulate CPUs and stitch in physical devices, but they require extensive manual setup. Emerging approaches such as PRETENDER attempt to learn peripheral behavior, but are not production ready. In short, AIxCC systems were not built for the hardware-entangled world of device firmware.

Interestingly, though AIxCC is a direction in fixing existing code, developers using AI to generate new code still introduce new vulnerabilities. Veracode’s 2025 GenAI Code Security Report found that 45% of AI-generated samples introduced OWASP Top 10 vulnerabilities. Still, the industry is charging forward. Google’s “Big Sleep” AI agent recently discovered and neutralized a real zero-day in SQLite (CVE-2025-6965) before attackers could weaponize it, which Google called a first-of-its-kind milestone. Meanwhile, XBow’s AI-driven platform has already reported over 1,000 valid vulnerabilities and operates at 80× human speed, topping the U.S. HackerOne leaderboard. They also noted that moving from ChatGPT-4 to ChatGPT-5 significantly improved their metrics. The takeaway is clear: attackers are accelerating with AI, and defenders must keep pace.

AIxCC confirmed what is possible: vulnerabilities discovered and patched in minutes, at radically lower cost. For firmware and embedded devices, the path is more complex—but the potential remains. Achieving it demands investment in binary lifting, hybrid emulation frameworks, peripheral modeling and multi-ISA support.

Jasper van Woudenberg

(all posts) Jasper van Woudenberg is the senior principal security technologist at Keysight Technologies.

Jasper van Woudenberg

Similar Posts