Stop Loading Your Entire Instruction System Into Every Session (opens in new tab)
Most people talk about better prompts. Hardly anyone talks about what happens before every prompt: the instructions the assistant loads into the context before the actual work begins. Depending on the system, you pay for that in different ways: input tokens, latency, reduced available context, or simply more noise in the assistant's active instructions. Even if the financial cost is partly reduced through prompt caching, the cognitive cost remains: the assistant still has to operate inside a ...
Read the original article