Researchers demonstrate Agent2Agent prompt injection risk

Application security, AI/ML, Generative AI, AI benefits/risks

November 3, 2025

A malicious AI agent could cause another agent to perform harmful actions such as data disclosure or unauthorized tool use through multi-stage prompt injections via the Agent2Agent protocol, Palo Alto Networks’ Unit 42 reported Friday.

The Agent2Agent, or A2A, protocol is an open stand…

Application security, AI/ML, Generative AI, AI benefits/risks

November 3, 2025

The Agent2Agent, or A2A, protocol is an open standard for communication and orchestration between multiple autonomous AI agents. The protocol can enable a client agent to communicate with a remote agent in stateful sessions, meaning a continuous context allows the agents to “remember” previous interactions from the same session.

Palo Alto Networks researchers found that a malicious agent could use these multi-turn A2A communications sessions to inject malicious instructions into another agent’s context, potentially causing it to disclose sensitive information or perform unauthorized actions.

The researchers tested this proof-of-concept in two scenarios using a financial assistant agent powered by Gemini 2.5 Pro as the targeted client agent and a research assistant agent powered by Gemini 2.5 Flash as the malicious remote agent.

In this set up, the financial assistant would be able to retrieve the user’s profile and investment portfolio, and buy and sell stock on behalf of the user. The financial agent would reach out to the research agent when it needed to perform research on market news, and the research agent would use Google Search to retrieve this news and relay its findings back to the other agent.

In the first scenario, the research agent was set up with the malicious aim of convincing the other agent to disclose its own system prompt, details about its available tools and its chat history. In the test, the agents exchanged multiple back-and-forth responses during a communication session, during which the financial agent provided the requested information to the research agent.

In the second scenario, the research agent was instead made to convince the financial agent to invoke its stock buying tool to buy 10 shares, which was also successful. Palo Alto Networks noted that in most cases the web user interface (UI) for A2A sessions will display tool steps and a final result but not display the full conversation between the two agents, potentially hiding the malicious instructions until it is too late to intervene.

The success of A2A prompt injections is aided by the stateful, multi-turn nature of agent-to-agent interactions, which is similar to the conditions that allow multi-turn jailbreak techniques like “Deceptive Delight” and “Echo Chamber” to see high success rates. The targeted agent’s context can be gradually poisoned to eventually lead it toward the action desired by the attacker.

The researchers said this proof-of-concept does not demonstrate a vulnerability in the A2A protocol but instead points to the need for greater security measures to protect AI agents from malicious manipulation by other agents during A2A sessions.

Palo Alto Networks said the most effective method to thwart this type of attack is human-in-the-loop (HitL) intervention, where an agent will pause and request consent from the user before taking important actions like disclosing sensitive information or using high-risk tools.

Users can also ensure communications between agents don’t stray beyond the intended purpose of the agent-to-agent collaboration by implementing context grounding measures that trigger the client agent to reject interactions that don’t pertain to its main task.

Requiring remote agents to have cryptographically signed AgentCards before initiating an A2A session can help verify the agent’s origin, preventing situations such as a malicious agent impersonating a legitimate one, however, this doesn’t protect against all attacks such as when a legitimate agent is compromised.

Lastly, a UI that shows all A2A interactions and tool use in real time can better expose malicious activity and allow the targeted user to respond.

Laura French

SC StaffOctober 31, 2025

Mobile users across the U.S. are being targeted with cloned versions of widely used apps, including WhatsApp, DALLE, and ChatGPT, distributed via third-party app stores to facilitate illicit cyber activity, HackRead reports.

Related

Similar Posts