Agentically optimizing LLM prompt cache TTLs for fun and profit (opens in new tab)
A case study on production objective hill climbing Firetiger runs a few hundred large language model (LLM) agents in production, and prompt caching is a critical tool to manage the cost of running such a workload. Properly setting cache time-to-live (TTL), how long a cached prefix survives before the next
Read the original article