MLSN #18: Adversarial Diffusion, Activation Oracles, Weird Generalization
lesswrong.com·1d
🏰Capability Machines
Preview
Report Post

Published on January 20, 2026 5:03 PM GMT

Diffusion LLMs for Adversarial Attack Generation

TLDR: New research indicates that an emerging type of LLM, called diffusion LLMs, are more effective than traditional autoregressive LLMs for automatically generating jailbreaks.

Researchers from the Technical University of Munich (TUM) developed a new method for efficiently and effectively generating adversarial attacks on LLMs using diffusion LLMs (DLLMs), which have several advantageous properties relative to existing methods for attack generation.

<img src=“https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/nRsZxrApFwM5bdiWr/955c2b6e4c7fd0106ddb8d8b1ffc13e574…

Similar Posts

Loading similar posts...

Keyboard Shortcuts

Navigation
Next / previous item
j/k
Open post
oorEnter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help