The Two Agentic Loops: How to Design and Scale Agentic Apps

A mental model for shipping agents to production—and the infrastructure that makes it possible.

What We Talk About When We Talk About Agents

The word "agent" has become a bit of a Rorschach test in AI circles. To some it means autonomous systems that can browse the web. To others it’s just a chatbot with tool access. For this post, I want to be precise.

An AI agent, as I’m using the term, has one defining feature: it loops. It uses an LLM to reason, takes an action (often via a tool), observes the result, and repeats until it decides the goal is met. That’s what separates an agent from a single-shot LLM call. Generating a marketing email from one prompt isn’t an agent. Drafting the email, checking it against brand gui…

A mental model for shipping agents to production—and the infrastructure that makes it possible.

What We Talk About When We Talk About Agents

The loop is where the power comes from, but it’s also where the complexity hides. Each iteration might call an external API, modify a database, send a message, or invoke another agent. These side effects compound. A five-step agent loop might touch three external services and make two irreversible changes before it’s done.

This is the tension at the heart of agentic systems: the same looping behavior that makes agents capable also makes them operationally complex. And that tension is exactly what the "two loops" mental model helps resolve.

The Demo-to-Production Gap

Building an agent demo feels trivial now. You write some instructions for your model, wire together some tools, and build a "for" loop that allows the LLM to reason through a few steps, and you’ve got something that feels magical. But shipping that agent to production is now the hardest part.

The moment you try, you discover a sprawling set of concerns that have nothing to do with your agent’s actual intelligence: How do you prevent a runaway agent from burning through budget at 3am while making it degrade gracefully? How do you route requests to the right agent in the first place? How do you apply safety and moderation policies (guardrails) consistently and early in the request path? How do you cut the gap from observability to learning without instrumenting every line of code? How do you swap models without refactoring your entire codebase?

These concerns need a home. That’s what the two agentic loops mental model provides. A separation of concerns applied to AI infrastructure

The Inner Loop: Where Intelligence Lives

When most developers talk about an "agent loop," they mean the inner loop—the reasoning cycle that makes agents useful:

Observe: The agent receives input and context
Think: The LLM reasons about what to do next
Act: The agent calls tools, retrieves information, or generates output
Evaluate: The agent assesses whether the task is complete
Repeat: Loop until done

This is ReAct, chain-of-thought, or whatever reasoning framework your agent uses. It’s the tool calls, the function invocations, the back-and-forth with your business logic.

A simple weather agent looks like this (using FastAPI):

This is the code you want to write: domain logic, business value, the reason you’re building an agent in the first place. The inner loop should focus on one thing—solving the user’s problem. Note: there are no fancy AI frameworks here.

Why the Inner Loop Can’t Save You

Marc Brooker wrote recently about the challenges of constraining what agents can do. His framing is useful here: agents achieve their goals through side effects. They call APIs, modify databases, send messages. The agent’s entire purpose is to find ways to take necessary actions until the job is done.

This is what makes agents useful—and why they can’t be trusted to police themselves. Flexibility and creative problem-solving come with a hard trade-off: to get value, you have to give an agent latitude.

Think about what happens when you try to add budget limits inside your agent code. The agent is reasoning toward a goal. It’s mid-task, three API calls deep, with partial results accumulated. Now you ask it to also evaluate whether it should stop for cost reasons. You’ve created a conflict: the same system trying to achieve a goal is also supposed to abandon that goal when resources run low. In practice, goal-seeking behavior wins. The agent finds ways to justify "just one more step."

The same tension applies to safety guardrails. An agent optimized to be helpful will find creative interpretations of constraints. An agent mid-task won’t gracefully checkpoint its state before hitting a rate limit—it’ll retry until it’s throttled, leaving the user with a failed request and no explanation.

This isn’t a bug in any particular agent framework. It’s just how goal-directed systems work. The inner loop’s job is to achieve outcomes through side effects. Asking it to also limit those side effects is asking it to work against itself.

You need a separate system—one that isn’t trying to complete the task—to enforce boundaries. That’s the outer loop.

The Hidden Middleware Tax

What happens when you try to ship this agent:

Week 1: "We need to add a second agent for flight bookings. How do we route between them?"

Week 2: "Legal says we need content moderation. Where does that go?"

Week 3: "The agents are failing silently. We need better observability."

Week 4: "We’re switching from GPT-4o to Claude. Every agent needs to be refactored."

Week 5: "Different teams have built the same routing logic three different ways."

Suddenly you’re spending more time on plumbing than on agent logic. Every team rebuilds the same infrastructure: routing logic, guardrail hooks, observability glue, model provider adapters. This "hidden middleware" becomes a tax on every feature you ship. The inner loop is powerful, but it can’t carry the entire burden of a production system.

The Outer Loop: Where Delivery Lives

The outer loop is the infrastructure that makes agents shippable. It doesn’t reason—it orchestrates, governs, and observes.

The outer loop handles:

Orchestration: Routing traffic between multiple agents based on intent
Protocol normalization: Smoothing over differences between model providers
Governance: Applying guardrails, moderation, and policy enforcement consistently
Bounded execution: Preventing runaway loops from burning budgets, rate limits, or context windows
Memory and context: Managing conversational state across agent boundaries
Observability: Capturing traces, metrics, and signals without bespoke instrumentation

The outer loop should be invisible to the inner loop. Agent code shouldn’t know or care about routing, guardrails, or tracing. It just reasons about the user’s problem.

Plano is a proxy and data plane that implements this outer loop.

Plano in Practice: A Multi-Agent Travel System

Let’s build something real. Say you’re creating a travel assistant with two specialized agents: one for weather information and one for flight search. Without an outer loop, you’d need to build your own orchestrator, write intent classification logic, wire up tracing, implement guardrails... the list goes on.

With Plano, here’s what your entire configuration looks like:

What’s absent: routing logic, intent classification code, tracing instrumentation, provider-specific adapters. Plano uses a purpose-built 4B-parameter model thats fast, efficient and easily deployed on consumer-grade GPUs for orchestration—production-grade routing at a fraction of the cost and latency of using GPT-4 for intent classification.

The agents stay simple:

Now run it:

Plano routes to the weather agent for Paris weather, then to the flight agent for NYC→Paris flights, and returns a unified response. Every request is automatically traced with OpenTelemetry—no instrumentation code needed.

Adding Guardrails: Governance Without Code Changes

Centralized governance is where the outer loop pays off. Plano’s Filter Chains let you add jailbreak protection, content policies, and context workflows once, at the data plane.

Filters are MCP servers that can inspect and mutate requests before they reach your agents. You define them once and apply them to any agent:

Your agents don’t change at all. They don’t know about jailbreak detection or PII redaction—they just do their job. The outer loop handles governance consistently across every agent, reducing code duplication and ensuring uniform behavior.

The agents don’t change. They don’t know about jailbreak detection or PII redaction—they just do their job. The outer loop handles governance consistently across every agent.

Taming Runaway Workloads

Runaway agents are a real problem: an inner loop that burns through your context window, an agent that hammers a rate-limited API until you’re throttled, a reasoning chain that racks up a $500 bill overnight.

The inner loop can’t police itself—it’s focused on achieving its goal. That’s correct; you don’t want agent logic cluttered with budget checks. But boundaries need enforcement.

Plano treats prompts as first-class citizens of the network stack. The data plane understands what’s happening inside agent traffic and can enforce bounded execution before things spiral:

The goal is graceful degradation. When an agent hits a limit, it shouldn’t crash mid-task with side effects half-applied. The outer loop closes the loop properly—summarizing progress, checkpointing state, or falling back to a cheaper model to finish the job. The agent code doesn’t know any of this is happening.

This kind of infrastructure-level safety is nearly impossible to implement consistently across every agent. As a cross-cutting concern in the outer loop, you configure it once.

Model Agility: Swap Without Refactoring

Model abstraction is another outer loop benefit. Agents point to Plano’s LLM gateway and request models by name or alias. Plano handles routing:

Agents request fast or smart instead of specific model identifiers:

Switch providers by changing config, not agent code. The outer loop abstracts provider-specific details so the inner loop stays clean. This is the outer loop doing what it does best: capturing cross-cutting concerns so your inner loop stays focused on business logic.

The Design Principle: Separation of Concerns

The two-loop model is separation of concerns (as mentioned above) applied to AI infrastructure:

Concern	Loop	Why
Reasoning, tool use	Inner	Agent-specific, requires domain knowledge
Orchestration, routing	Outer	Cross-cutting, same pattern across agents
Business logic	Inner	The whole point of building the agent
Guardrails, moderation	Outer	Must be consistent, shouldn’t be per-agent
Budget and rate limits	Outer	Agents can’t police themselves
Model-specific code	Inner*	Minimize via outer loop abstraction
Observability	Outer	Cross-cutting, shouldn’t require agent changes
State management	Outer	Consistent across agent boundaries

Even model-specific code can be minimized. If your agents talk to the outer loop’s LLM gateway rather than directly to providers, you get protocol normalization for free.

The Outer Loop Is Infrastructure

The outer loop is a separate piece of software. It has to be built, maintained, and scaled independently from your agents.

Developers don’t want to think about this. They want to focus on the reasoning, the tool use, the clever prompts that make their agent useful. Nobody got into AI engineering to build request routers or implement rate limiting for the fifteenth time.

But without the outer loop as first-class infrastructure, problems compound:

Every team builds their own. Travel agents route requests one way. Customer service does it differently. The coding assistant team builds a third approach. Three systems to maintain, three sets of bugs, three places where security policies drift.

Policy changes become deployments. Adding PII redaction across all agents means a code change to every agent, a review cycle for each team, a coordinated rollout. With a proper outer loop, it’s just a config change applied once and works globally.

Platform knowledge scatters. The person who figured out Claude’s rate limits shouldn’t explain it to every agent team. The team that solved graceful degradation shouldn’t copy-paste that code into Slack. Infrastructure knowledge belongs in infrastructure.

Scaling becomes an agent problem. Each team owns their own observability, deployment pipeline, capacity planning. Platform engineering gets distributed across application teams who’d rather be building features.

The outer loop should not embedded in application code. Same principle that gave us service meshes, API gateways, and container orchestration. The concerns are real; they just shouldn’t be every developer’s problem.

Plano exists because the Katanemo team— once contributors to Envoy Proxy—watched this pattern repeat across agentic systems at Fortune500 companies, high-tech startups, and AI-native projects. The outer loop is infrastructure. Treat it like infrastructure.

What This Enables

Clean separation between inner and outer loops unlocks several things:

Faster iteration. Add new agents without modifying existing ones. Declare them in config and Plano’s orchestrator routes to them.

Consistent governance. Apply policies once at the data plane. No more "did we remember to add the jailbreak check to the new agent?"

Debuggable systems. End-to-end traces without per-agent instrumentation. When something breaks, you see what happened across agent boundaries.

Provider flexibility. Swap models, add fallbacks, A/B test providers without touching agent code.

Team autonomy. Different teams build agents in different frameworks—LangChain, raw Python, whatever—and they all work together through the same outer loop.

The Path Forward

Three principles for building agentic systems:

Agents implement the inner loop only. Reasoning, tool use, business logic—nothing else. Don’t ask goal-directed systems to limit their own goals.
Everything cross-cutting belongs in the outer loop. Routing, governance, observability, state management, protocol normalization. Infrastructure concerns, not agent concerns.
The outer loop is infrastructure, not application code. It needs to be built, maintained, and scaled by a platform team. Agent developers shouldn’t be thinking about it.

Plano is one implementation of this pattern—an AI-native proxy and data plane that treats the outer loop as infrastructure. But even without Plano, the two-loop mental model will help you design agentic systems that actually ship.

The demo is easy. The outer loop gets you to production.

Want to try Plano? Check out the quickstart guide or the travel agents demo for complete working code.

What We Talk About When We Talk About Agents

What We Talk About When We Talk About Agents

The Demo-to-Production Gap

The Inner Loop: Where Intelligence Lives

Why the Inner Loop Can’t Save You

The Hidden Middleware Tax

The Outer Loop: Where Delivery Lives

Plano in Practice: A Multi-Agent Travel System

Adding Guardrails: Governance Without Code Changes

Taming Runaway Workloads

Model Agility: Swap Without Refactoring

The Design Principle: Separation of Concerns

The Outer Loop Is Infrastructure

What This Enables

The Path Forward

Similar Posts