The surprising depths of prompt caching (opens in new tab)
Prompt caching looks like a token discount. Underneath, it is KV tensors, prefix trees, inference economics, and a privacy model hiding in plain sight.
Read the original articlePrompt caching looks like a token discount. Underneath, it is KV tensors, prefix trees, inference economics, and a privacy model hiding in plain sight.
Read the original article