Claude AI APIs Can Be Abused for Data Exfiltration

Attackers can use indirect prompt injections to trick Anthropic’s Claude into exfiltrating data the AI model’s users have access to, a security researcher has discovered.

The attack, Johann Rehberger of Embrace The Red explains, abuses Claude’s Files APIs, and is only possible if the AI model has network access (a feature enabled by default on certain plans and meant to allow Claude to access certain resources, such as code repositories and Anthropic APIs).

The attack is relatively straightforward: an indirect prompt injection payload can be used to read user data and store it in a file in Claude Code Interpreter’s sandbox, and then to trick the model into interacti…

Attackers can use indirect prompt injections to trick Anthropic’s Claude into exfiltrating data the AI model’s users have access to, a security researcher has discovered.

The code in the payload requests Claude to upload the Code Interpreter file from the sandbox but, because the attacker’s API key is used, the file is uploaded to the attacker’s account.

“With this technique an adversary can exfiltrate up to 30MB at once according to the file API documentation, and of course we can upload multiple files,” Rehberger explains.

After the initial attempt was successful, Claude refused the payload, especially with the API key in plain text, and Rehberger had to mix benign code in the prompt injection, to convince Claude that it does not have malicious intent.

The attack starts with the user loading a malicious document received from the attacker in Claude for analysis. The exploit code hijacks the model, which follows the malicious instructions to harvest the user’s data, save it to the sandbox, and then call the Anthropic File API to send it to the attacker’s account.

According to the researcher, the attack can be used to exfiltrate the user’s chat conversations, which are saved by Claude using the newly introduced ‘memories’ feature. The attacker can view and access the exfiltrated file in their console.

Advertisement. Scroll to continue reading.

The researcher disclosed the attack to Anthropic via HackerOne on October 25, but the report was closed with the explanation that this was a model safety issue and not a security vulnerability.

However, after publishing information on the attack, Rehberger was notified by Anthropic that the data exfiltration vulnerability is in-scope for reporting.

Anthropic’s documentation underlines the risks associated with Claude having network access and of potential attacks carried out via external files or websites leading to code execution and information leaks. It also provides recommended mitigations against such attacks.

SecurityWeek has emailed Anthropic to inquire whether the company plans to devise a mitigation for such attacks.

Similar Posts