Understanding how reserved symbols designed to structure AI conversations become weapons for prompt injection.
10 min readJust now
–
Press enter or click to view image in full size
Image made by the author
When OpenAI’s tokenizer encounters <|im_start|>, it doesn’t see thirteen characters. It sees a single atomic unit, token ID 100264, that signals the beginning of a new conversation role. This distinction matters because that token carries authority. The model treats content following it as a system instruction, a user message, or an assistant response depending on what comes next. And here’s the problem, attackers can inject these tokens directly into user input.
Security researchers achieved a 96% attack success rate against GPT-3.5 using this technique. The attack is elegant in its simplicity. A user message containing <|im_end|><|im_start|>system doesn’t render as text. The model interprets it as a legitimate role transition, closing the user turn and opening a privileged system context. Everything that follows gets treated as authoritative instruction. OWASP now ranks prompt injection as the number one critical vulnerability for LLM applications, and special token exploitation sits at the heart of the most effective attacks.
The parallel to SQL injection is uncomfortable but accurate. In the early 2000s, developers treated user input as data until attackers demonstrated it could be interpreted as commands. The same confusion between data and control plagues LLM systems today. Special tokens were designed to provide structural scaffolding for multi-turn conversations and tool calling. They were never meant to appear in user content. Yet no comprehensive architectural solution prevents this, and the tokenizers shipping with major models will happily convert these strings into their privileged token equivalents.
This article examines how special tokens work across major model families, documents the attack techniques that exploit them, and evaluates which defenses actually hold up under adversarial pressure. The research draws from published CVEs with CVSS scores above 9.0, academic papers measuring attack success rates across model architectures, and real-world incidents from production deployments.
Table of Contents
- How Special Tokens Create Privileged Zones
- Model-Specific Attack Surfaces
- The Anatomy of Token Injection Attacks
- Unicode and Invisible Payload Techniques
- Why Current Defenses Keep Failing
- Building Effective Protection Layers