how to hit prompt cache more consistently
sankalp.bearblog.dev·3w
Preview
Report Post
  • 17 Nov, 2025 *

UPDATE: If you want to understand how prompt caching works under the hood, read my blog how prompt caching works - paged attention and prefix caching plus practical tips. This blog is an extract out of that just covering the tips part for readability reasons.

Prompt caching is when LLM providers reuse previously computed key-value tensors for identical prompt prefixes, skipping redundant computation. When you hit the cache, you pay less and get faster responses.

Prompt caching basics and why even worry about it

If you use Codex/Claude Code/Cursor and check the API usage, you will notice a lot of the tokens are "cached". Luckily code is structured and multiple queries can attend to same context/prefixe…

Similar Posts

Loading similar posts...