A brief history of KV cache compression developments (opens in new tab)
How KV cache compression - from MQA and GQA to MLA and linear-attention hybrids - quietly unlocked the long context windows that make modern agentic LLMs possible.
Read the original article