Linear attention replaces the unbounded cache of softmax attention with a fixed-size recurrent state, reducing sequence mixing to linear time and decoding to constant memory. The hard part is not just what to forget, but how to edit this compressed memory without scrambling existing associations. Delta-rule models subtract the current read before writing a new value, and Kimi Delta Attention (KDA) sharpens forgetting with channel-wise decay. But...

Sign in to keep reading the full article.

Covered in 5 articles

LLM Research Papers: The 2026 List (January to May)

magazine.sebastianraschka.com

··Hacker News

AI/ML Research Digest

dev.to··DEV

Musk loses to OpenAI, Google's IO updates, OpenAI solves Erdős

lastweekin.ai·

View all 5 ›