PagedAttention is more than virtual memory (opens in new tab)
Everyone repeats the analogy that PagedAttention is virtual memory for the KV cache. But I think the more interesting story is what comes along with it: copy-on-write, swapping, thrashing, a missing MMU, a timing side-channel that leaks other users' prompts, and new ways to optimise serving LLMs.
Read the original article