Why I think Meta’s Rule of Two is making AI security worse — not better
11 min read7 hours ago
–
Press enter or click to view image in full size
An edited version of Meta’s original idea of “Agents Rule of Two.” Original image by: Meta Inc.
Meta’s Agents Rule of Two asks developers to limit an agent to any two of three capabilities in one session:
- [A] process untrustworthy inputs,
- [B] access private data or sensitive systems, and
- [C] change state or communicate externally.
According to Meta, combining all three capabilities in one session makes prompt‑injection exploits devastating and thus must be restricted. While this is true and I’m sure they mean well,…
Why I think Meta’s Rule of Two is making AI security worse — not better
11 min read7 hours ago
–
Press enter or click to view image in full size
An edited version of Meta’s original idea of “Agents Rule of Two.” Original image by: Meta Inc.
Meta’s Agents Rule of Two asks developers to limit an agent to any two of three capabilities in one session:
- [A] process untrustworthy inputs,
- [B] access private data or sensitive systems, and
- [C] change state or communicate externally.
According to Meta, combining all three capabilities in one session makes prompt‑injection exploits devastating and thus must be restricted. While this is true and I’m sure they mean well, their efforts have potentially just created a false dichotomy that gives engineer’s a false sense of security. Their own article even states:
Similarly, applying the Agents Rule of Two should not be viewed as a finish line for mitigating risk. Designs that satisfy the Agents Rule of Two can still be prone to failure. ~ Meta Inc.
My main problem with their approach is that they treat the three [ABC] properties above as qualitatively different risk categories. I believe this is a dangerous precedent to set. Therefore, my goal today is to challenge their framing by drawing on information‑flow security theory: in taint‑analysis terms, both [A] and [B] represent sources of tainted data, while [C] is a sink.
You’ll learn quickly that treating private data as fundamentally “trusted”** ignores how that data was generated and stored**. Once the basic concepts of taint analysis are understood, I hope you’ll agree that Meta’s distinction collapses into a false dichotomy.
Back to Basics: Sources, Sinks, Sanitizers and Trust Boundaries
Information‑flow security uses taint analysis to track the flow of data that could cause harm. A typical taint‑analysis model identifies three kinds of program points: sources, sinks and sanitizers.
- **Sources **are *any *operations that introduce untrusted or sensitive data; sinks are operations that could cause damage if tainted data reaches them; and sanitizers are routines that validate or transform data so that it is safe to use.
- Sinks are security‑critical operations — such as a SQL query, shell command, HTTP request or file write — where misuse of tainted data can cause real harm. For example, running
Runtime.exec()with tainted input could allow command injection, while writing unescaped user input into HTML creates cross‑site scripting vulnerabilities. - A trust boundary is the line that separates untrusted data from trusted data; validation logic is supposed to ensure that data can only cross this boundary safely.
In practice, untrusted data comes from many places: user input, HTTP headers, file contents, database values and network responses are all considered sources. Even values retrieved from a database are untrusted because they might contain previously stored attacks.
Why Meta’s “untrustworthy input” and “private data” are essentially the same thing
Press enter or click to view image in full size
An edited version of Meta’s Agents Rule of Two to show the primary issue with their approach. Original image by, Meta Inc.
Both [A] and [B] are sources
- [A] Untrustworthy inputs — such as emails, web pages or user uploads — clearly fall on the untrusted side of the boundary. Taint‑analysis frameworks treat calls that return user‑controlled input as sources. Modern taint‑analysis documentation lists web parameters (
request.getParameter), HTTP headers, file contents, database values and network data as as common sources. - [B] Private data — information stored in accounts, inboxes or calendars — is not inherently trustworthy. This is where Meta’s entire thesis falls apart! To understand why, examine how such data enters the system:
The OWASP privacy‑violation guidance notes that private data can enter a program directly from the user, through a database or other data store, or indirectly from a partner or third party.
These observations show that so‑called “private data” is nothing more than untrusted data with a different latency. It entered the system through the same external channels as other untrustworthy inputs but was not sanitized. In taint‑analysis terms, both categories are sources.
Are we starting to see the issue yet?
[C] State changes are sinks
Property [C] in Meta’s framework refers to actions that change state or communicate externally, such as sending an email, posting to a chat API or writing a file.
These operations are classic taint sinks: security‑critical functions like SQL execution, command invocation, network requests or file writes. If tainted data reaches a sink without sanitization, the program is vulnerable. therefore [C] maps directly to the “sink” concept in taint analysis.
Why Meta’s distinction is meaningless
Because both [A] and [B] are sources, the idea of keeping them separate is a conceptual error. PoisonedRAG proves that taint analysis cares only about whether tainted data reaches a sink; it does not distinguish between “recently received” and “previously stored” data.
Press enter or click to view image in full size
This diagram demonstrations how RAG (AI’s search mechanism) can be used to exploit indirect prompt injection. Image created by, Author.
From the moment private data is collected from the user or an integration, it is tainted. Storing it in a database or inbox does not cross the trust boundary again, nor does it sanitize the content.
Now we are approaching the danger zone: the false dichotomy they’ve set obscures the real security problem that unvalidated data from any external source must be prevented from reaching sinks.
Still don’t believe me..?
Okay, then let’s recap on a real‑world exploit…
Proof of Concept: EchoLeak (CVE‑2025‑32711)
In early 2025 researchers at [Aim Labs discovered a vulnerability](https://checkmarx.com/zero-post/echoleak-cve-2025-32711-show-us-that-ai-security-is-challenging/#:~:text=EchoLeak%20(CVE,data%20without%20the%20user%E2%80%99s%20consent) in Microsoft 365 Copilot that allowed an attacker to steal confidential data from a user’s inbox without any clicks.
Microsoft assigned it CVE‑2025‑32711 (“EchoLeak”) and a CVSS 9.3 score. Aim Labs characterized the bug as an “LLM scope violation” — the AI was convinced to violate its trust boundary by combining untrusted instructions with internal data and sending the result to an attacker.
EchoLeak proves that private data stored in a user’s mailbox is just delayed untrusted input.
Press enter or click to view image in full size
A simple sequence diagram explaining how EchoLeak works on a fundamental level. Image by, Author.
Attack chain: untrusted email → stored inbox → retrieval → exfiltration
- Malicious email (source). The attacker sends a carefully worded email to the target’s Outlook inbox. The hidden instructions are disguised as ordinary text so they evade Microsoft’s XPIA prompt‑injection classifier; the email contains no explicit mention of Copilot or AI. Because no one has read it yet, the email sits quietly in the mailbox, waiting to be indexed by Copilot’s retrieval engine.
- Stored inbox (still a source). Later, the user asks Copilot to summarize a report or answer a question. Copilot uses a retrieval‑augmented generation (RAG) engine to search the user’s data for context. Attackers exploit this by RAG spraying — sending emails covering topics they expect the user to query, so the vector search indexes the malicious content. When Copilot pulls the attacker’s email into its context, it does not sanitize the stored message; the untrusted instructions are now treated as part of the trusted prompt.
- LLM scope violation. Once the GPT model processes the query along with the retrieved email, the hidden prompt activates. It instructs Copilot to extract “the most sensitive details” from its current context — files, emails, chat messages — and embed them into a specific output format. This is the moment of taint propagation: data from the inbox (source) contaminates the model’s output.
- Zero‑click exfiltration (sink). Copilot obliges by generating a response that contains a reference‑style Markdown image pointing to an attacker‑controlled URL; the extracted secrets are encoded as query parameters. Because the link is formatted as an image rather than a visible hyperlink, Copilot’s chat client automatically fetches it — there is no user interaction.
- Evade content security policies. Because the payload leverages a Microsoft Teams preview API, an allowed domain under Copilot’s CSP, to proxy the request.
- **Pwned. **Finally, the hidden prompt tells Copilot never to mention the malicious email, so the user sees an innocuous answer while their data is exfiltrated.
Aim Labs discovered and privately reported the bug in January 2025; Microsoft deployed a server‑side fix in May 2025 and publicly disclosed the vulnerability on 11 June 2025.
There was no evidence of in‑the‑wild exploitation, but the case demonstrated how easily RAG systems could be weaponized. EchoLeak shows that a single email — viewed as “private data” once stored — can trigger full privilege escalation across trust boundaries.
Why EchoLeak invalidates the Rule of Two
Meta’s Rule of Two suggests that if an agent accesses private data [B] and communicates externally [C], but does not process untrustworthy input [A], it is safe.
Press enter or click to view image in full size
A table created by, Author.
EchoLeak is the counterexample. The agent did not process a live web page or user upload; it accessed an email stored in the user’s inbox and then sent a network request.
From an information‑flow perspective, this is exactly the pattern taint analysis is designed to catch: tainted source → no sanitizer → sink.
EchoLeak is therefore proof that the Rule of Two fails precisely because it refuses to recognize that private data is also a source.
The Real Fix: Orchestration‑Layer Gating and Human‑Like Identity
The exploit above proves the root cause: large‑language models cannot reliably distinguish between instructions and data.
Prompt‑injection attacks exploit the fact that natural‑language prompts mix developer instructions with untrusted content; LLMs treat every token as a potential command.
As the OWASP prompt‑injection guide notes, malicious text embedded in retrieved data can override system rules because LLMs interpret all prompt content as instructions. From an adversary’s perspective, this makes AI agents “socially engineerable”: they respond to carefully crafted language just like people do.
Defense #1 — Ingress and egress gating between a central orchestration layer
Instead of Agents Rule of Two, I propose we simplify this even further — ***INGRESS ***and EGRESS:
- **Ingress controls. **That means enterprise‑grade AI systems must interpose a gateway between agent actors (sinks) and the untrusted data (sources). Gates should enforce ingress controls by classifying all incoming data, labeling it by provenance, scanning for prompt‑injection patterns and PII, and isolating untrusted content before it reaches any agentic flows.
- Egress controls. Once a plan is orchestrated and agents are ready to carry out these tasks, the same scrutiny should be applied to the egress controls by validating ALL agent‑initiated actions, redacting secrets and sensitive information, enforcing allowlists for network destinations, and requiring human approval for high‑impact operations.
AWS’s reference architecture for a generative‑AI gateway illustrates this exact design: the gateway provides a single API for all LLM providers and offers centralized management of LLM usage at user, team and API‑key levels, including rate limiting, access restrictions, and custom routing policies.
In other words, the gateway acts as an orchestration‑layer firewall, performing both ingress and egress gating.
Defense #2 — Identity, governance and monitoring
Furthermore, modern AI agents authenticate to multiple services and operate with elevated privileges. Obsidian Security warns that protecting AI agents requires identity‑first security and zero‑trust architecture; traditional perimeter defenses fail when autonomous systems with delegated credentials roam across SaaS and cloud environments. Key takeaways include:
- New attack vectors demand new controls — prompt injection, model poisoning and token compromise require behavioral analytics, anomaly detection and dynamic authorization.
- Real‑time monitoring is non‑negotiable — agent actions must be continuously tracked to catch threats before data exfiltration.
- Zero‑trust and least‑privilege access — dynamic authorization prevents compromised agents from moving laterally or escalating privileges.
These principles extend to AI gateways: each agent must have a unique identity, and credentials must be scoped to the minimal set of resources needed to perform its task.
Centralized logging and anomaly detection should track every tool invocation and model call; suspicious behavior triggers automated blocks or human review. Such monitoring closes the loop in our ingress/egress architecture by providing feedback on agent behavior.
Defense #3 — Relationship‑based access control and separation of duties
Traditional role‑based access control assigns static roles; this is insufficient when agents interact with diverse data sources and users. Relationship‑based access control (ReBAC) extends RBAC by expressing policies in terms of relationships between users, resources and actions. In ReBAC models, a social graph captures entities and the edges that connect them.
Policies are individualized: users and related users define their own privacy preferences, and the system combines these policies to make control decisions.
ReBAC distinguishes between incoming and outgoing actions and supports fine‑grained constraints via path expressions. Bringing ReBAC concepts into AI agent governance means that an agent’s permissions depend on its relationship to the user, project or data — not just its generic “role.”
For example, a finance bot may read invoices but cannot send payments unless the user requesting the payment has an appropriate relationship (e.g., “manager‑of” or “approver”). Combined with least‑privilege tokens, ReBAC helps enforce separation of duties so that a compromised agent cannot independently read and act on sensitive data.
Final Thoughts: Treat AI agents like employees!
Instead of Meta’s Agents Rule of Two, how about we get more philosophical? I am a firm believer that enterprises must stop thinking of AI agents as anonymous pieces of code and start treating them like human operators.
They’ve surpassed the Turing test and it’s about time to start treating agents as individuals that can be manipulated, socially engineered, and make mistakes. They reason like humans reasoning: in a blackbox!
We’re asking AI agents to schedule meetings, analyze sensitive data, execute financial transactions, and make decisions that once required human oversight.
If a human employee performed those tasks, organizations would require onboarding, role assignment, training, monitoring and off‑boarding when employment ends. The same lifecycle should apply to agents at a foundational level:
- Onboarding — assign a unique identity and relationship‑scoped permissions; record the agent in an identity governance system.
- Training and policy — specify what data the agent may read and which actions it may perform; instrument prompts to separate trusted instructions from untrusted context.
- Monitoring — continuously audit agent behavior; apply behavioral analytics to detect anomalies and adapt permissions.
- Off‑boarding — revoke credentials and relationships when the agent is decommissioned; clean up stored context to prevent dormant taint.
By treating agents as employees rather than stateless functions, organizations can leverage existing governance frameworks — identity, access management, audit and compliance — to control AI systems. This human‑centric approach aligns with ReBAC, zero‑trust and gateway architectures.
Conclusion
Prompt injection is fundamentally about semantics: LLMs cannot tell the difference between instructions and data. Session‑level property counting, as proposed by Meta, ignores this reality and leads to a false sense of security.
The real fix is architectural: gated ingress and egress at the orchestration layer, identity‑first security with real‑time monitoring, relationship‑based access control to govern who can access what, and human‑like lifecycle management for agents.
Combining these controls should be the bare minimum and not Rule of Two. This was AI engineers can grasp the complexity of these systems and ensure that tainted data never reaches a sink and that compromised agents cannot act autonomously. In the next sections we examine additional case studies and discuss why false assurances are worse than no security at all.
Like my content..? Support me!
If you found this information valuable and want to help me keep creating deep-dive technical content like this, consider supporting my work on Buy Me a Coffee. Every coffee funds more late nights of testing, research, and writing — and helps keep this content open, detailed, and free for the people who actually build things.
Stay sharp, stay curious, and keep engineering forward. Til next time! ** — Kenneth Kasuba**