Anthropic put out a report late last week about how they discovered and aborted a large-scale AI cyberattack. It’s worth reading.
This isn’t the first time they’ve reported this sort of misuse. But it’s the most practical real-world example of it. A Chinese state-sponsored organization used Claude Code to do all the core technical work of hacking ~30 global targets. These were real targets: government agencies, tech companies, financial institutions. Some of the attacks were successful.
And AI automated 80-90% of the attack. It’s not entirely clear how that number translates to actual labor savings, but the report implies it is significant.
Two things…
Anthropic put out a report late last week about how they discovered and aborted a large-scale AI cyberattack. It’s worth reading.
This isn’t the first time they’ve reported this sort of misuse. But it’s the most practical real-world example of it. A Chinese state-sponsored organization used Claude Code to do all the core technical work of hacking ~30 global targets. These were real targets: government agencies, tech companies, financial institutions. Some of the attacks were successful.
And AI automated 80-90% of the attack. It’s not entirely clear how that number translates to actual labor savings, but the report implies it is significant.
Two things that stood out to me:
- For now, Claude seems to still have a substantial capability advantage over open models on long time horizon agentic tasks.
- The attackers made material sacrifices to avoid triggering Claude’s safeguards. The report mentions that the attackers had to intentionally limit the context available to Claude in order to avoid refusals. A guardrail-free model would remove this constraint and would be a straightforward way for attackers to increase the efficacy of their attacks.
I expect many similar cases have already happened in the wild but gone undetected.