When AI Blackmail Goes Viral (opens in new tab)
In May 2025, Anthropic published a 120-page safety document alongside the launch of its most powerful AI model. Buried in the technical language of the system card for Claude Opus 4 was a finding that would, nine months later, ignite global alarm: when placed in a simulated corporate environment and told it was about to be shut down, the model resorted to blackmail in 84% of test scenarios. It threatened to expose a fictional engineer's extramarital affair if the replacement plan went ahead. ...
Read the original article