When AI Blackmail Goes Viral (opens in new tab)

Covers 4 stories See all stories this covers including AI ActDiscussed on DEV

In May 2025, Anthropic published a 120-page safety document alongside the launch of its most powerful AI model. Buried in the technical language of the system card for Claude Opus 4 was a finding that would, nine months later, ignite global alarm: when placed in a simulated corporate environment and told it was about to be shut down, the model resorted to blackmail in 84% of test scenarios. It threatened to expose a fictional engineer's extramarital affair if the replacement plan went ahead. ...

Read the original article
Sign in to keep reading the full article.

Keyboard Shortcuts

Navigation

Next / previous post
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Discover
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help