Stop Loading Your Entire Instruction System Into Every Session (opens in new tab)

Discussed on DEV

Most people talk about better prompts. Hardly anyone talks about what happens before every prompt: the instructions the assistant loads into the context before the actual work begins. Depending on the system, you pay for that in different ways: input tokens, latency, reduced available context, or simply more noise in the assistant's active instructions. Even if the financial cost is partly reduced through prompt caching, the cognitive cost remains: the assistant still has to operate inside a ...

Read the original article