MCP Colors: Systematically deal with prompt injection risk
simonwillison.net·22h
Flag this post

MCP Colors: Systematically deal with prompt injection risk (via) Tim Kellogg proposes a neat way to think about prompt injection, especially with respect to MCP tools.

Classify every tool with a color: red if it exposes the agent to untrusted (potentially malicious) instructions, blue if it involves a “critical action” - something you would not want an attacker to be able to trigger.

This means you can configure your agent to actively avoid mixing the two colors at once:

The Chore: Go label every data input, and every tool (especially MCP tools). For MCP tools & resources, you can use the _meta object to keep track of the color. The agent can…

Similar Posts

Loading similar posts...