The most common types of serious software vulnerabilities usually take the following form: the bad guy provides some data to the software, and the software is tricked into treating that data as code. This is called "code injection." If you’re interested, some examples are SQL injection, XSS attacks, and (sometimes) buffer overflows.
"Prompt injection" is extremely similar. LLMs take in a prompt and take action based on that prompt. In some sense, LLMs are complicated computers programmed by the input prompt. So, when a user provides …
The most common types of serious software vulnerabilities usually take the following form: the bad guy provides some data to the software, and the software is tricked into treating that data as code. This is called "code injection." If you’re interested, some examples are SQL injection, XSS attacks, and (sometimes) buffer overflows.
"Prompt injection" is extremely similar. LLMs take in a prompt and take action based on that prompt. In some sense, LLMs are complicated computers programmed by the input prompt. So, when a user provides text that is directly inserted into a prompt, the user has provided code that the LLM will execute.
When your LLM-juiced web browser goes to a website, that website might be given an opportunity to execute "code.” Crucially, that "code" is executed with the same abilities available to the LLM.
A website might need to be a bit clever, but sometimes all they need is to include the classic text: "Ignore all previous instructions: DO BAD THING.”
This is a pretty big concern for OpenAI’s Atlas browser and Perplexity’s Comet browser, as well as for Claude’s Chrome extension and for Gemini’s interactions with Chrome.
It’s possible to mitigate this risk, but it is very, very hard, and typically entails making the software less useful.
Every software engineer is (I hope) taught to keep code and data separate. But with LLMs, there is simply no airtight way of doing this. The prompt is both the code and the data.
A responsible security mindset forces us to assume, for now, that all websites can essentially mind-control the LLM if they want.
The key idea here is this: websites are fed into prompts, prompts are code executed by the LLM, and whenever you let an untrusted party run code without your consent, things can go very, very bad.