• 17 Nov, 2025 *

UPDATE: If you want to understand how prompt caching works under the hood, read my blog how prompt caching works - paged attention and prefix caching plus practical tips. This blog is an extract out of that just covering the tips part for readability reasons.

Prompt caching is when LLM providers reuse previously computed key-value tensors for identical prompt prefixes, skipping redundant computation. When you hit the cache, you pay less and get faster responses.

Prompt caching basics and why even worry about it

If you use Codex/Claude Code/Cursor and check the API usage, you will notice a lot of the tokens are "cached". Luckily code is structured and multiple queries can attend to same context/prefixe…

Similar Posts

Loading similar posts...

Keyboard Shortcuts

Navigation
Next / previous item
j/k
Open post
oorEnter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help