How prompt caching works - Paged Attention and Automatic Prefix Caching plus practical tips
sankalp.bearblog.dev·2d·
Discuss: Hacker News
Preview
Report Post
  • 30 Nov, 2025 *

Table of Contents

Intro Lore and Motivation - Yapping about why I wrote this post and giving a brief on territory we are about to venture in 1.

Tips to hit prompt cache more consistently - Why prompt caching matters and how to improve cache hits 1.

LLM inference basics - Prefill, decode, and KV caching fundamentals 1.

The memory problem - Why naive KV cache allocation doesn’t scale 1.

Paged attention - vLLM’s OS-inspired solution with blocks and block tables 1.

Prefix caching - Block hashing, longest cache hit, and the full picture

Prerequisite: Sections 2 onwards assumes familiarity with self-attention in deco…

Similar Posts

Loading similar posts...