Securing AI Agents with Information Flow Control (Part II)

Inside the Planner: How Decisions, Memory, and Labels Can Shape Agent Behavior

This article is part of a three-part series that explains and contextualizes the Microsoft Research paper: Securing AI Agents with Information-Flow Control (written by Manuel Costa, Boris Köpf, Aashish Kolluri, Andrew Paverd, Mark Russinovich, Ahmed Salem, Shruti Tople, Lukas Wutschitz, and Santiago Zanella-Béguelin).

My goal is to translate their theoretical model and guarantees into something security engineers, architects, and researchers can use, without sacrificing rigor.

1. From Agent Loops to Planners

In Part I

Inside the Planner: How Decisions, Memory, and Labels Can Shape Agent Behavior

My goal is to translate their theoretical model and guarantees into something security engineers, architects, and researchers can use, without sacrificing rigor.

1. From Agent Loops to Planners

In Part I, we asked a simple but uncomfortable question:

What happens when you give an AI agent the keys to your systems?

We saw how tool-calling agents can be hijacked by prompt injection and abused to leak data or perform unintended actions. We argued that Information-Flow Control (IFC) is a promising way to make such leaks impossible by design.

But there’s a missing piece. Before we can control agents, we need to understand how they actually decide what to do. Where does the decision (e.g., “send this email”, “query this API”, or “write to this datastore”) truly come from?

That decision lives in the planner. This part is about how a planner loops, how it remembers, how it carries labels, and how we can instrument it to enforce security.

2. The Planning Loop

Recall the high-level agent loop (from Part I, Section 2). That loop is intentionally generic, but it’s also too monolithic. For security, we need a clear control surface. That is, a decomposition where we can explicitly point to each decision boundary and say:

“At this exact point, before any tool runs, check the policy.”

The paper achieves this by decomposing the agent into:

A planning loop: the fixed scaffolding that drives interaction, and
A planner: the strategy that decides what action to take next.

Think of the planning loop as the kernel scheduler, and the planner as the process that makes system calls.

2.1. Action Spaces

The planning loop mediates all interactions with the model, tools, and the user. It is parameterized by a state-passing planner function P.

At each iteration, P consumes the latest message in the conversation and returns one of three actions:

As such, every loop iteration is represented as one of three action types:

That’s the entire API between “how the agent thinks” and “how the environment executes”.

2.2. Basic Planner Algorithm

The planner is the actual gatekeeper between model reasoning and real-world side effects. The planning loop works as follows:

This mechanism is delicate, and the reader should pause here and pay close attention to the following observations:

You can already see where IFC will hook in:

before executing MakeCall, ask “is this call safe under our policy?”, and, before returning Finish, ask “are we allowed to reveal this information?”

3. The Variable-Passing Planner

A planner that only looks at raw messages is too weak for real-world tasks.

Agents need memory: “What did that API return?”, “Which file did I just read?”, “ What is the ID of the ticket I created?”. Without the ability to store and reuse this information, the planner cannot assemble non-trivial workflows.

3.1. Adding Planner Memory

The paper introduces a more powerful planner that keeps an internal memory μ. You can think of μ as a map μ: variable_name → value. When a tool returns a result, the planner:

This is the variable-passing planner.

3.2. Why Variables Matter

Variables are not just for convenience. They are the foundation for IFC. They give us three key capabilities:

1. Composability

A later tool call can say “use x and y from earlier steps” without the model hallucinating those values.

2. Control over what the LLM sees

The planner can decide whether to expose the raw value or keep it as a variable. This is an IFC hook: we may want an LLM that reasons about the existence of a variable without seeing the secret inside it.

3. Clear boundaries for labeling

Each variable can carry an IFC label indicating its origin and the level of trustworthiness.

3.3. Variable-Passing Flow

Conceptually, the variable-passing planner behaves like this:

You can think of the variable-passing planner flow as building a small, typed environment for the agent’s current “plan.”

4. Adding Information-Flow Labels

Up to now, the planner has only been about “control flow”. To reason about security, we need to track what data flows through that control flow.

We assign labels from a set L to all pieces of data in the system. We require that labels L form a lattice with a partial order ⊑ and join operation ⊔, used to compute the least upper bound of two labels.

Two dimensions are particularly important for us: confidentiality (who is allowed to read data) and integrity (who is allowed to modify data).

4.1. Confidentiality Lattice

The canonical confidentiality lattice consists of two elements:

where L denotes public (low-confidentiality) data and H denotes secret (high-confidentiality) data. In this lattice:

For example, if data x is readable by users {A, B, C} and data y is readable by users {B, C, D}, then any data derived from both (e.g., their concatenation xy) is labeled with {A, B, C} ⊔ {B, C, D} = {B, C}.

This formalizes a key confidentiality principle: derived data must not be visible to anyone who was not authorized to see all of its inputs.

4.2. Integrity Lattice

Integrity is modeled dually to confidentiality. The canonical integrity lattice is:

where T denotes trusted (high-integrity) data and U denotes untrusted (low-integrity) data. In this lattice:

For example, if data x may be written by users {A, B, C}, and data y by users {B, C, D}, then any data derived from both (e.g., their concatenation xy) must assume influence from {A, B, C} ⊔ {B, C, D} = {A, B, C, D}.

This reflects the integrity threat model: if an untrusted actor may have influenced any input, the result must be treated as potentially influenced by all of them.

4.3. Product Lattice: Putting Them Together

The system uses the product of confidentiality and integrity lattices:

The (Integrity x Confidentiality) Lattice

You can picture this as a diamond:

Arrows go from bottom to top following “can flow to” rules. For example, trusted public (T, L) can safely flow anywhere, and untrusted secret (U, H) is the most restrictive.

This product lattice is the space in which IFC policies are defined.

5. Propagating Labels Through the Planner

Now we combine the planning machinery with labels. This is where IFC becomes operational.

The idea is to run the planning loop with taint tracking:

5.1. Labeling Variables and Actions

Formally, there’s a function τ that assigns labels to variables.

You can think of τ as a map τ: variable → L. Tool results are stored in variables x, and each variable has a label τ(x) summarizing the labels of the tool’s arguments, and the labels of any datastore locations it reads. Actions also carry labels. In particular, a tool f has a static label (e.g., trusted/untrusted), and each argument to the tool call has a dynamic label.

The tool result and any datastore variables W(f) it writes to are assigned a label that soundly over-approximates the labels of the action and all datastore variables R(f) it may read from.

5.2. Planning Loop with Taint Tracking

The taint-tracking planning loop extends the previous algorithm with:

A policy function that decides if an action is allowed given labels.
A label computation step that computes the label of the tool result and updates variable labels accordingly.

JoinLabels computes a single, conservative label for the result of a tool call. It joins (takes the least upper bound of) three sources of influence: the label of the tool itself, the labels of all arguments passed to the tool, and the labels of all datastore variables the tool may read. The resulting label soundly over-approximates everything that could have affected the tool’s output, ensuring that no dependency (whether from inputs, state, or the tool’s own trust assumptions) is lost.

UpdateLabels propagates that result label back into the datastore after the tool executes. For every variable the tool may write, the label map is updated so those variables now carry the result’s label, reflecting that their contents are influenced by the same sources as the tool output. This step preserves label monotonicity across the agent’s execution and prevents later planner decisions from treating derived state as more trusted or less sensitive than it truly is.

In this part, we zoomed in on the planner, the component where control decisions are made, and showed how taint tracking can be embedded directly into planning logic.

In Part III, we take the final step. We move from mechanisms to guarantees. We will show how these labeled planners give rise to concrete security properties: what is prevented, what is allowed, and why. This is where theory meets practice.

Follow to get notified when Part III drops.

Securing AI Agents with Information Flow Control (Part II) was originally published in InfoSec Write-ups on Medium, where people are continuing the conversation by highlighting and responding to this story.

Inside the Planner: How Decisions, Memory, and Labels Can Shape Agent Behavior

1. From Agent Loops to Planners

Inside the Planner: How Decisions, Memory, and Labels Can Shape Agent Behavior

1. From Agent Loops to Planners

2. The Planning Loop

2.1. Action Spaces

2.2. Basic Planner Algorithm

3. The Variable-Passing Planner

3.1. Adding Planner Memory

3.2. Why Variables Matter

3.3. Variable-Passing Flow

4. Adding Information-Flow Labels

4.1. Confidentiality Lattice

4.2. Integrity Lattice

4.3. Product Lattice: Putting Them Together

5. Propagating Labels Through the Planner

5.1. Labeling Variables and Actions

5.2. Planning Loop with Taint Tracking

Similar Posts