Don't Wrap the LLM. Make Its Failure Modes Unreachable. (opens in new tab)
There's a class of bug in modern GenAI products that doesn't have a fix in Martin Fowler and Venkat Subramaniam's nine patterns — prompt injection through a chat interface to a tool. The standard mitigation is to send the user's prompt through another LLM (the "guardrail") that decides whether the prompt is malicious. That guardrail has the same properties as the model it's guarding: it's non-deterministic, hallucination-prone, and can be tricked by the same techniques it's supposed to catch....
Read the original article