PEAKS No 28: Can We Build an NX Bit for LLMs
Hi there!
I’ve been thinking about prompt injection lately, and it’s honestly terrifying how vulnerable LLM applications are. The core problem is simple: these models can’t reliably tell the difference between your instructions and user data. It’s like having a computer that treats everything as executable code.
We’ve tried the usual defenses—input filtering, fancy prompt engineering, detection systems—but they’re all probabilistic. Nothing provides real guarantees.
This reminded me of buffer overflow attacks from decades ago. The solution there was the NX bit: hardware that literally prevents data regions from being executed. Could we do something similar for LLMs?
Turns out, maybe. There’s promising research on "Structured Queries" that uses special delimiter tokens to separate trusted instructions from untrusted data, with models trained to respect that boundary. It’s not perfect—it’s probabilistic, not deterministic—but it significantly raises the bar.
Working on a new article on this topic.
🛡️ Security & Privacy
- Chrome gains AI scam detection control - Google Chrome now allows users to delete on-device AI models powering Enhanced Protection for scam and malware detection. More
- Cursor AI commands become attack vectors - Security researchers reveal how trusted commands in Cursor IDE can be exploited by attackers, highlighting the agent security paradox. More
- Claude Cowork file exfiltration risk - Security analysis shows Claude’s Cowork feature may pose file exfiltration risks, raising concerns about AI assistant security. More
- Microsoft Copilot session hijacking attack - Researchers discovered the "Reprompt" attack allowing hackers to hijack Microsoft Copilot sessions and manipulate AI responses. More