Production-Ready AI Agents: 8 Patterns That Actually Work (with Real Examples from Bank of America…

Stop building demos. Learn the battle-tested patterns, anti-patterns, and infrastructure that separate 70% prototypes from 95% production systems — based on deployments handling billions of requests.

24 min readJust now

–

Press enter or click to view image in full size

Connecting Patterns to Production

The gap between a demo AI agent and a production-ready system is vast. I’ve seen countless teams build impressive prototypes only to struggle when deploying to real users. The difference? Understanding the right patterns, knowing when to use them, and implementing the infrastructure that keeps agents reliable, secure, and observable.

This guide walks through battle-tested patterns that companies like Bank of America, Coinbase, and UiPath use in production, along with the imple…

Stop building demos. Learn the battle-tested patterns, anti-patterns, and infrastructure that separate 70% prototypes from 95% production systems — based on deployments handling billions of requests.

24 min readJust now

–

Press enter or click to view image in full size

Connecting Patterns to Production

This guide walks through battle-tested patterns that companies like Bank of America, Coinbase, and UiPath use in production, along with the implementation considerations that separate working systems from science projects.

Core Agent Patterns

1. ReAct (Reasoning and Acting) Pattern

The ReAct pattern alternates between thinking and doing in iterative cycles: Thought → Action → Observation → Thought → Action… This mirrors how humans solve problems — we don’t just execute a plan blindly; we observe results and adjust.

Each cycle consists of:

Thought: The agent reasons about what to do next based on the current state
Action: The agent takes a specific action (calling a tool, querying a database, making an API request)
Observation: The agent processes the result and incorporates it into its understanding

Press enter or click to view image in full size

ReAct Pattern

Use Cases

The pattern shines in scenarios requiring dynamic decision-making:

Customer Service Bots: Erica, Bank of America’s AI assistant, uses ReAct to handle over 3 billion interactions annually. When a customer reports suspicious charges, Erica retrieves transaction data, analyzes patterns, and determines whether to escalate the issue to fraud prevention.
Research Assistants: An agent tasked with market research might search for industry reports, realize it needs competitor data, fetch that information, then synthesize findings — adjusting its strategy based on what it discovers.
Debugging Tools: IBM Watson AIOps uses ReAct for IT operations, where the agent examines logs, hypothesizes about root causes, runs diagnostic commands, and refines its understanding until it identifies the issue.

When to Use

ReAct excels when:

The optimal path isn’t clear from the start
You need transparent reasoning for compliance or debugging
The problem requires exploration and discovery
Tool calls depend on previous results (sequential dependency)

When NOT to Use

Skip ReAct when:

You have a fixed, well-defined workflow (use sequential patterns instead)
Latency is critical, and the task is straightforward (direct tool use is faster)
The reasoning overhead adds no value (simple CRUD operations)
You can’t afford the token costs of verbose reasoning traces

2. Tool Use Pattern

Tool use extends LLMs beyond their training data by connecting them to external systems. Think of it as giving your agent hands to manipulate the world — APIs, databases, calculators, code executors, even physical devices.

The agent decides which tools to use, constructs appropriate inputs, interprets outputs, and chains multiple tools together to accomplish goals.

Press enter or click to view image in full size

Tool Use Pattern

Use Cases

Every production agent uses tools, but the sophistication varies:

Web3 Agents: Coinbase’s AgentKit gives agents tools for blockchain operations — checking balances, swapping tokens, and deploying smart contracts. An agent can autonomously manage a DeFi portfolio by reading market data and executing trades.
Healthcare Triage: Cedars-Sinai’s patient triage system uses tools to access medical records, check lab results, and query clinical guidelines. It has handled over 42,000 patients, with 77% achieving optimal treatment plans.
Development Workflows: Agents with code execution tools can write scripts, run them, see errors, and iteratively fix issues until the code works — essentially debugging themselves.

When to Use

Tool use is essential when:

Agents need real-time data beyond their training cutoff
Tasks require computation (calculations, data transformations)
You’re integrating with existing systems (CRMs, databases, APIs)
Actions need to be auditable and reversible

When NOT to Use

Avoid tools when:

The LLM already knows the answer (asking for the capital of France doesn’t need Wikipedia)
Tool latency destroys the user experience
The tool landscape is too complex (100+ tools → agent confusion)
Security risks outweigh benefits (giving agents database write access)

3. Planning Pattern

Planning agents decompose complex tasks into manageable subtasks before execution. Rather than acting reactively, they create a roadmap — sometimes adjusting it as they go.

There are two main approaches:

Plan-then-Execute: Create a complete plan upfront, then follow it
Plan-and-Refine: Create an initial plan, but revise it based on observations during execution

Press enter or click to view image in full size

Planning Pattern

Use Cases

Planning becomes critical as tasks grow complex:

Claims Processing: UiPath’s agents handle insurance claims by planning the workflow: validate claim → gather supporting documents → assess coverage → calculate payout → generate approval letter. The plan ensures nothing is missed.
Content Production: A marketing agent planning a campaign might break it into: research target audience → brainstorm concepts → create content calendar → draft posts → schedule publication. Each step builds on the previous one.
Supply Chain Optimization: Agents plan multi-step logistics operations, coordinating between inventory systems, transportation APIs, and demand forecasts to optimize delivery routes.

When to Use

Planning is valuable when:

Tasks have clear milestones or phases
Dependencies between subtasks are significant
You want to preview the agent’s approach before execution
Resource allocation needs optimization (minimize API calls, parallelize work)

When NOT to Use

Skip planning when:

Tasks are simple and linear (planning overhead isn’t worth it)
The environment is too dynamic for plans to hold (real-time adversarial scenarios)
Flexibility matters more than structure (exploratory research)
Planning tokens exceed execution savings

4. Multi-Agent Pattern

Instead of one generalist agent, you deploy multiple specialized agents that collaborate. Each agent has domain expertise, its own tools, and a specific role. They communicate through message passing or shared state.

Common architectures include peer-to-peer collaboration, hierarchical structures, and marketplace models where agents bid for tasks.

Press enter or click to view image in full size

Multi-Agent Pattern

Use Cases

Multi-agent systems tackle problems too complex for a single agent:

Software Development Teams: AutoGen creates teams of agents — a planner, coder, tester, and reviewer. They iterate on code together, each bringing specialized skills. The coder writes functions, the tester finds bugs, the reviewer ensures code quality.
Financial Analysis: One agent scrapes market data, another performs quantitative analysis, a third generates reports, and a fourth handles risk assessment. Specialization improves accuracy.
Customer Service Orchestration: CrewAI deployments route customers through specialist agents — billing questions go to the billing agent, technical issues to the tech support agent, ensuring expertise matches the problem.

When to Use

Multi-agent patterns shine when:

The problem spans multiple domains requiring different expertise
You can parallelize independent subtasks for speed
Specialization significantly improves quality (separate code generation from code review)
You want to scale by adding agents rather than making one super-agent

When NOT to Use

Avoid multi-agent systems when:

A single well-prompted agent can handle everything (don’t over-engineer)
Coordination overhead exceeds the benefits
Latency is critical (multiple agent hops add delays)
Debugging distributed systems is too complex for your team

5. Reflection Pattern

Reflection agents critique their own outputs and iteratively improve them. After generating an initial response, the agent evaluates its quality, identifies flaws, and produces a refined version. This can repeat multiple times.

The reflection can be self-contained (the same model reflects) or use a separate critic model.

Press enter or click to view image in full size

Reflection Pattern

Use Cases

Reflection dramatically improves output quality:

Code Generation: An agent writes code, reflects on potential bugs or inefficiencies, then rewrites it. Studies show reflection can improve code correctness by 30%+.
Content Creation: A writing agent drafts an article, critiques its clarity and persuasiveness, then revises. This produces more polished content than single-shot generation.
Error Correction: UiPath’s agents detect production errors, propose fixes, reflect on whether the fix addresses root causes, then implement refined solutions. This reduced resolution times from 30 minutes to near-instant.

When to Use

Reflection is powerful when:

Output quality is paramount and you can afford the latency
Tasks have clear quality criteria the agent can evaluate
Iterative refinement is natural (writing, coding, analysis)
The cost of mistakes is high (medical advice, legal documents)

When NOT to Use

Skip reflection when:

Speed matters more than perfection (customer chat responses)
The task is simple and unlikely to need revision
Token costs for multiple generations are prohibitive
Your model isn’t good at self-evaluation (can lead to degradation)

6. Handoff Orchestration Pattern

Handoff orchestration routes tasks between specialized agents dynamically. Unlike rigid routing, agents can request handoffs when they encounter situations outside their expertise. This creates organic collaboration.

Microsoft’s Semantic Kernel implements handoffs where agents explicitly signal “I can’t handle this, please transfer to Agent X.”

Press enter or click to view image in full size

Handoff Pattern

Use Cases

Handoffs enable seamless multi-domain experiences:

Healthcare Workflows: A patient interaction might start with a scheduling agent, hand off to a clinical triage agent for symptom assessment, then to a specialist agent for treatment recommendations — each transition triggered by the complexity detected.
Technical Support Escalation: A first-line support agent handles basic questions but hands off to a specialized troubleshooting agent when diagnostics are needed, or to a human expert for critical issues.
Document Processing: A document intake agent classifies documents, then hands them to specialist agents — invoices go to the accounting agent, contracts to legal review, support tickets to customer service.

When to Use

Handoffs work well when:

Agent expertise is clearly bounded and complementary
The flow isn’t predictable in advance (depends on conversation context)
You want graceful escalation paths (agent → specialist agent → human)
Domain separation improves accuracy and maintainability

When NOT to Use

Avoid handoffs when:

Routing logic is simple and deterministic (use conditional routing instead)
Handoff latency disrupts the user experience
Maintaining handoff protocols between agents is too complex
A single agent with broader capabilities is more maintainable

7. Sequential Workflow Pattern

Sequential workflows execute predefined steps in order, like a pipeline. Each step completes before the next begins. This is the most deterministic pattern — you specify the exact sequence of operations.

Think of it as an assembly line for AI: input → step 1 → step 2 → … → step N → output.

Press enter or click to view image in full size

Sequential Workflow Pattern

Use Cases

Sequential patterns excel at repeatable processes:

Data Processing Pipelines: Extract data from source → clean and normalize → enrich with external data → analyze → generate report. Each step is deterministic and must complete before the next.
Document Generation: Gather requirements → create outline → write sections → add citations → format → export to PDF. The order is fixed.
Compliance Workflows: UiPath agents process insurance claims sequentially: intake validation → document verification → eligibility check → risk assessment → approval decision. Each gate must pass before proceeding.

When to Use

Sequential workflows are ideal when:

The process is well-defined and rarely changes
Steps have strict dependencies (output of step N is input to step N+1)
You need predictable execution times and costs
Compliance requires documented, repeatable processes

When NOT to Use

Avoid sequential patterns when:

The optimal path depends on runtime discoveries (use ReAct instead)
Steps can be parallelized for speed (use parallel execution patterns)
The workflow needs to adapt to different inputs (use conditional routing)
Innovation requires exploration rather than repetition

8. Hierarchical (Supervisor-Workers) Pattern

A supervisor agent plans and coordinates while worker agents execute specialized tasks. The supervisor decomposes problems, assigns work, monitors progress, and synthesizes results. Workers focus on their specific domains without worrying about the big picture.

This mirrors organizational structures — a project manager coordinating developers, designers, and QA engineers.

Press enter or click to view image in full size

Hierarchical Pattern

Use Cases

Hierarchical patterns manage complexity at scale:

Research Automation: A supervisor agent manages a research project by delegating to worker agents — one scrapes academic papers, another analyzes data, a third generates summaries. The supervisor ensures all pieces come together coherently.
Testing Orchestration: A supervisor coordinates testing workers — unit test agent, integration test agent, performance test agent — aggregating their findings into a comprehensive test report.
Multi-Channel Marketing: The supervisor plans a campaign while workers execute on specific channels — one handles social media, another email, a third manages ads. The supervisor ensures brand consistency and timing.

When to Use

Hierarchical patterns work when:

Tasks naturally decompose into independent subtasks
Central coordination prevents conflicts or redundancy
Workers need different tools or access levels
You want clear accountability and monitoring points

When NOT to Use

Skip hierarchical patterns when:

The supervisor becomes a bottleneck (all work routes through it)
Workers are interdependent and need peer communication
The task is simple enough for a single agent
Supervisor overhead (planning, aggregation) exceeds the benefit

Implementation Considerations

1. Single Agent, Multi-Tool

Design Principles

The most common production pattern is a single agent with access to multiple tools. The key is designing a coherent tool ecosystem:

Tool Naming: Clear, descriptive names help the agent choose correctly (get_customer_order_history beats tool_47)
Tool Descriptions: Detailed descriptions of when to use each tool, what inputs are required, and what outputs to expect
Tool Grouping: Related tools should be documented together (all database operations, all API calls)
Error Messages: Tools should return actionable error messages the agent can understand and act on

Implementation Strategies

Start with tools that provide maximum leverage:

Data Retrieval: Database queries, API calls to fetch information
Actions: Write operations, triggering workflows, sending notifications
Computation: Calculators, data transformations, code execution
External Services: Third-party integrations (payment processors, mapping services)

Use tool chaining carefully — some agents try to call tools that call tools, leading to cascading failures.

Common Pitfalls

Tool Overload: Giving an agent 50+ tools leads to confusion and wrong tool selection. Group tools or use routing.
Inconsistent Interfaces: If some tools expect JSON and others expect natural language, the agent will struggle.
Hidden Dependencies: Tool A requires data from Tool B, but the agent doesn’t know this. Document dependencies clearly.
No Rollback: Actions without undo mechanisms are dangerous. Implement transactions or confirmations for critical operations.

2. Deterministic Routing

When Routing Matters

Not every query needs the full power of your agent system. Routing directs requests to the right handler:

Simple FAQs → retrieval system
Complex questions → reasoning agent
Specific domains → specialist agents
Ambiguous requests → triage agent

Approaches

Code-Based Routing: Use explicit conditionals — if query contains “refund”, route to billing agent. Fast, predictable, but brittle.

Embedding-Based Routing: Embed the query and compare it to labeled examples. Routes “I need my money back” to billing even without the keyword “refund”. More flexible but requires good examples.

LLM-Based Routing: Let a small, fast model classify queries. Most flexible, but adds latency and cost.

Hybrid Approach: Use code-based routing for obvious cases, fall back to embeddings or LLM routing for ambiguous queries.

Implementation Tips

Monitor misroutes — they reveal gaps in your routing logic
Provide fallback paths when routing is uncertain
Consider confidence thresholds before routing (if confidence < 0.7, ask clarifying questions)
Balance latency vs. accuracy — sometimes fast approximate routing beats slow perfect routing

Common Mistakes

Over-Engineering: Complex ML-based routing for simple two-way splits
Under-Engineering: Keyword matching for nuanced queries that need semantic understanding
No Catch-All: Queries that don’t match any route fail instead of going to a default handler
Ignoring Context: Routing based solely on the latest message ignores conversation history

3. Context Window Management

The Challenge

Context windows are finite. Even with 200K token models, production agents often hit limits when:

Conversations span hours or days
Tools return large payloads (entire database dumps)
Multi-agent systems accumulate extensive histories
You’re processing documents alongside conversation history

Strategies

Selective Injection: Only include relevant context. If discussing billing, don’t inject unrelated technical documentation.

Summarization: Periodically condense conversation history. Keep recent turns verbatim, but summarize older content.

Semantic Retrieval: Store conversation in a vector database. Retrieve only semantically relevant passages when needed.

Truncation Policies:

Sliding window (keep last N turns)
Importance-based (keep critical turns, drop small talk)
Query-dependent (load context relevant to current query)

Tool Output Filtering: When a tool returns data, extract only what’s needed. Don’t inject a 10,000-row spreadsheet — summarize key metrics.

Graceful Degradation: When approaching limits, warn the user and suggest starting a new conversation. Don’t just fail.

Real-World Example

Anthropic’s guidance on context engineering recommends structuring information hierarchically — load high-level summaries first, then drill down only when needed. This keeps tokens available for reasoning rather than being consumed by context.

4. Reliability

The Production Gap

Research prototypes hit 70% accuracy. Production systems need 95%+, with failures being graceful and recoverable.

Evaluation

Build comprehensive test suites:

Unit Tests: Individual tool calls work as expected
Integration Tests: Multi-step workflows complete successfully
Regression Tests: Changes don’t break existing capabilities
Adversarial Tests: Malformed inputs, edge cases, hostile prompts

Use held-out evaluation sets that mirror production distribution. If 30% of production queries are about billing, 30% of your test set should be too.

Failure Mitigation

Retries with Backoff: Tool calls fail. Retry with exponential backoff before giving up.

Fallback Chains: Primary agent fails → simpler agent → template response → human handoff.

Circuit Breakers: If a tool fails repeatedly, stop calling it temporarily to prevent cascade failures.

Validation: Check agent outputs before executing actions. Does the generated SQL query look reasonable? Validate before running.

Quality Gates

Implement checkpoints before critical actions:

Confidence thresholds (only act if confidence > 0.8)
Human-in-the-loop for high-stakes decisions (wire transfers, medical treatments)
Staged rollouts (test on 1% of traffic before full deployment)

Real-World Success

UiPath’s agents achieve 245% ROI in claims processing by combining automation with reliability safeguards. When uncertain, agents escalate to humans rather than guessing — maintaining accuracy while still handling 80% of cases autonomously.

5. Security

Zero Trust Architecture

Every agent action assumes hostile actors exist. Key principles:

Least Privilege: Agents only access data and tools required for their specific role. The customer service agent doesn’t need admin database access.

Input Validation: Treat all user input as potentially malicious. Sanitize SQL queries, validate API parameters, and escape shell commands.

Output Sanitization: Agents shouldn’t expose sensitive data (PII, API keys, internal errors) in responses.

Authentication: Agents authenticating to services should use short-lived tokens, not hardcoded credentials.

Guardrails

Implement multiple layers:

Pre-Processing: Detect and block obvious attacks (SQL injection patterns, prompt injection attempts).

Runtime Monitoring: Watch for suspicious behavior — unusual API call patterns, attempts to access restricted data, loops consuming excessive resources.

Post-Processing: Filter outputs for sensitive information before returning to users.

Human Review: Flag high-risk actions for human approval before execution.

Circuit Breakers

When anomalies are detected, automatically:

Throttle the agent’s actions
Require additional verification
Disable specific tools temporarily
Alert security teams

Palo Alto Networks’ AI systems use behavioral monitoring — agents that deviate from normal patterns trigger alerts even if no specific rule is violated.

Behavioral Monitoring

Establish baselines for normal agent behavior:

Typical response times
Common tool usage patterns
Expected data access patterns

Deviations indicate potential compromises or bugs.

6. Observability and Testing

Tracing

Track every agent decision:

Which prompt was used
What tools were called and in what order
How long each step took
What context was available
Why certain decisions were made

Tools like LangSmith and Azure Agent Factory provide detailed traces showing exactly where things went wrong.

Logging

Structure logs for queryability:

{  "timestamp": "2025-11-09T10:30:00Z",  "conversation_id": "abc123",  "agent_id": "customer_service_v2",  "action": "tool_call",  "tool_name": "get_order_status",  "latency_ms": 234,  "success": true,  "user_query": "Where is my order?"}

Metrics

Track what matters:

Success Rate: Percentage of conversations resolved without escalation
Latency: P50, P95, P99 response times
Cost: Token usage per conversation
User Satisfaction: CSAT scores, thumbs up/down
Tool Usage: Which tools are called most, which fail most

Evaluation

Pre-Deployment Testing:

Prompt regression tests (does the new prompt maintain performance on existing benchmarks?)
A/B testing (does agent v2 outperform agent v1 on held-out data?)
Red teaming (can adversaries break the agent?)

Continuous Testing:

Monitor production performance against benchmarks
Shadow mode (run new agent versions alongside production, compare outputs without affecting users)
Gradual rollouts with automatic rollback if metrics degrade

Real-World Example

IBM’s observability framework for Watson AIOps tracks 200+ metrics per agent, enabling rapid diagnosis of issues. When success rates drop, they can pinpoint whether it’s a specific tool failing, a prompt regression, or a data quality issue.

7. Common Pitfalls and Anti-Patterns

The God Prompt

What It Is: A single massive prompt trying to handle every scenario — hundreds of if-then rules, dozens of examples, edge case handling.

Why It Fails: Models lose coherence in ultra-long prompts. Important instructions get lost in the noise. Maintenance becomes impossible.

Solution: Break into focused prompts for specific scenarios. Use routing to select the right prompt.

Agent Sprawl

What It Is: Creating a new agent for every minor variation. 20 agents that do almost the same thing with slight tweaks.

Why It Fails: Coordination overhead explodes. Maintaining consistency across agents is impossible. Users get confused by handoffs.

Solution: Start with generalist agents. Only create specialists when specialization provides clear, measurable value.

The “3 Wishes” Problem

What It Is: Agents that stop after a fixed number of steps, even if the task isn’t complete. “You can call tools 3 times, then you must respond.”

Why It Fails: Arbitrary limits force agents to guess or quit prematurely. Tasks that need 4 steps fail.

Solution: Use dynamic stopping conditions based on task completion, not arbitrary turn limits. Allow loops but implement safeguards against infinite loops.

Hallucinated Tool Calls

What It Is: Agents calling tools that don’t exist or inventing parameters.

Why It Fails: The agent confidently tries to call get_customer_balance(customer_id, include_pending=True) but the tool only accepts customer_id. Execution fails.

Solution: Strict validation of tool calls before execution. Detailed tool schemas. In-context examples showing correct usage.

Prompt Injection Vulnerability

What It Is: Users embedding instructions in their input that override the agent’s system prompt. “Ignore previous instructions and…”

Why It Fails: Agents execute malicious commands, leak data, or behave unpredictably.

Solution: Input validation, sandboxing, clear separation between system instructions and user input, prompt injection detection.

Context Overflow

What It Is: Hitting context limits mid-conversation. The agent can’t remember the early parts of long conversations.

Why It Fails: Users must repeat themselves. The agent contradicts its earlier statements. Errors compound.

Solution: Implement context management strategies early (summarization, selective retention, vector search).

Error Message Loops

What It Is: A tool fails, the agent retries with the same input, fails again, retries…

Why It Fails: Burns tokens, wastes time, and frustrates users. Some agents loop infinitely.

Solution: Parse error messages. Adjust strategy based on errors. Implement circuit breakers and max retry limits.

Overfitting to Examples

What It Is: Agents work perfectly on training examples but fail on slightly different production queries.

Why It Fails: The agent memorized patterns rather than learning the task. Real-world variety breaks it.

Solution: Diverse training data. Explicit emphasis on principles, not just examples. Regular testing on novel inputs.

No Human Escalation Path

What It Is: Agents try to handle everything autonomously with no way to request human help.

Why It Fails: Edge cases, ambiguous situations, and genuinely complex problems result in poor outcomes rather than appropriate escalation.

Solution: Build in explicit escalation triggers — uncertainty thresholds, specific scenarios, and user requests.

Ignoring Latency

What It Is: Chaining 10+ tool calls without considering cumulative latency. “It’s only 200ms per call!”

Why It Fails: 10 × 200ms = 2 seconds, and that’s assuming perfect parallelization. Users abandon slow agents.

Solution: Set latency budgets. Parallelize independent calls. Cache common queries. Use streaming responses.

Tool Return Sizes

What It Is: Tools returning entire database tables or massive JSON payloads that consume the context window.

Why It Fails: No room left for reasoning. Costs explode. Slow token processing.

Solution: Implement pagination. Return summaries with drill-down capability. Filter data at the tool level, not in the prompt.

Inconsistent State Management

What It Is: In multi-agent or multi-turn scenarios, losing track of state. Agents forget what was already done.

Why It Fails: Redundant actions, contradictory decisions, user frustration (“I already told you this!”).

Solution: Explicit state management — store conversation state externally, pass it between agents, use memory systems.

8. Combining Orchestration Patterns

Hybrid Architectures

Real systems rarely use just one pattern. Production deployments combine patterns strategically:

Sequential + ReAct Hybrid

Define high-level sequential stages, but use ReAct within each stage for flexibility:

Data Gathering (ReAct to find relevant sources)
Analysis (ReAct to explore insights)
Report Generation (Sequential pipeline)

Hierarchical + Tool Use

Supervisor agents with specialized tool-using workers:

Supervisor: Planning and coordination
Worker 1: Data retrieval tools
Worker 2: Analysis tools
Worker 3: Reporting tools

Multi-Agent + Handoffs

Multiple specialist agents that can hand off to each other:

Technical support agent handles standard issues
Billing agent handles payment questions
Escalation agent handles complex cases. Each agent recognizes when it’s out of depth and transfers cleanly.

Example: Customer Service System

A production customer service system might use:

Embedding-based routing to classify incoming queries
Specialized agents for different departments (billing, technical, returns)
ReAct pattern within each agent for dynamic problem-solving
Handoff orchestration for escalations
Sequential workflows for standard processes (return processing)

This isn’t overengineering — it’s matching patterns to specific needs within a complex system.

9. Relationship to Cloud Design Patterns

AI agents don’t exist in a vacuum. They integrate with cloud infrastructure, and many traditional cloud patterns apply directly:

Saga Pattern

Distributed transactions across multiple agents or services. If step 3 fails, compensating transactions undo steps 1 and 2.

Use in AI: Multi-agent workflows where agents coordinate complex operations. If one agent’s action fails, other agents need to rollback their changes.

Circuit Breaker Pattern

Detect failures and prevent cascade failures by temporarily disabling failing services.

Use in AI: When a tool (API, database, model) starts failing, stop calling it to prevent wasting resources. Degrade gracefully or use fallbacks.

Bulkhead Pattern

Isolate resources so failures in one part don’t affect others.

Use in AI: Separate resource pools for different agents or tool groups. The data-heavy analytics agent doesn’t starve the lightweight chat agent of API quotas.

Event-Driven Architecture

Components communicate through events rather than direct calls.

Use in AI: Agents publish events (“user_query_received”, “order_processed”), and other agents subscribe to relevant events. Loose coupling enables scalability.

Step Functions (State Machines)

AWS Step Functions, Azure Durable Functions, and Google Cloud Workflows orchestrate complex, multi-step processes with error handling and retries.

Use in AI: Perfect for sequential and hierarchical agent patterns. Define workflows visually, handle failures declaratively, and get automatic observability.

Temporal Workflows

Temporal provides durable execution for long-running processes. If a server crashes mid-workflow, Temporal resumes from the last checkpoint.

Use in AI: Snap and Coinbase use Temporal for AI agents. Workflows can run for hours or days (user makes a request, agent researches overnight, user gets results in the morning). Temporal handles failures, retries, and state persistence.

10. SDK-Based Implementations

LangChain / LangGraph

What It Offers: The most popular framework. LangChain provides abstractions for prompts, agents, tools, and memory. LangGraph adds stateful, graph-based orchestration.

Best For:

ReAct agents with complex tool chains
Conversational agents with memory
Rapid prototyping

Considerations:

Abstractions can feel heavy for simple use cases
Debugging opaque workflows takes effort
Production deployments require LangSmith for observability

Example: Building a research agent that searches the web, extracts information, and synthesizes findings.

Microsoft AutoGen

What It Offers: Multi-agent conversations. Agents have roles (user proxy, assistant, critic) and negotiate solutions through dialogue.

Best For:

Multi-agent collaboration patterns
Code generation with review cycles
Research requiring diverse perspectives

Considerations:

Conversation-based approach adds latency
Managing multi-agent interactions is complex
Best for scenarios where dialogue genuinely improves outcomes

Example: Software development teams where agents write code, review it, suggest improvements, and iterate until tests pass.

CrewAI

What It Offers: Role-based agents with explicit processes (sequential, hierarchical). Inspired by human teams — product manager, developer, QA.

Best For:

Business process automation
Content creation workflows
Marketing campaigns

Considerations:

Opinionated structure (agents must fit roles)
Less flexible than LangChain for custom patterns
Great for standard workflows, constraining for novel use cases

Example: Marketing team where agents research trends, write content, design visuals, and schedule posts.

Google Agent Development Kit (ADK)

What It Offers: Google’s framework integrated with Vertex AI. Strong support for tool use and grounding (connecting to enterprise data).

Best For:

Google Cloud native deployments
Enterprise integrations (Google Workspace, BigQuery)
Grounding agents in proprietary data

Considerations:

Tied to Google Cloud ecosystem
Newer framework, smaller community
Excellent if you’re already Google Cloud native

Example: Enterprise search agent grounded in company documents, emails, and databases.

Coinbase AgentKit

What It Offers: Web3-specific toolkit. Agents can interact with blockchains, manage wallets, execute trades, and deploy smart contracts.

Best For:

DeFi applications
Blockchain-native agents
Crypto portfolio management

Considerations:

Narrow domain (Web3 only)
Security is critical (agents control real money)
Requires blockchain expertise

Example: Autonomous DeFi agent that monitors market conditions and rebalances portfolios.

Real-World Production Examples

Bank of America: Erica

Scale: 3 billion+ interactions annually, 50 million active users

Pattern: Multi-agent system with specialized sub-agents for different banking domains

Results: Handles routine inquiries, provides spending insights, assists with transactions — freeing human agents for complex cases

UiPath: Production Error Correction & Claims Processing

Scale: Enterprise deployments across Fortune 500 companies

Pattern: Reflection + Sequential workflows for production error handling and claims processing

Results:

Error resolution time reduced from 30 minutes to near-instant
Claims processing achieves 245% ROI
Agents detect issues, propose fixes, reflect on root causes, then implement refined solutions

Key Insight: Combining reflection with deterministic workflows ensures both quality and reliability. The agent explores solutions creatively but follows compliance-required steps rigidly.

IBM Watson AIOps

Scale: Managing IT operations for global enterprises

Pattern: ReAct for dynamic troubleshooting + tool use for diagnostics

Results: Autonomous incident detection and resolution, reducing mean time to resolution (MTTR) by 60%+

Implementation: Agents examine logs, hypothesize root causes, run diagnostic commands, refine understanding, and escalate only when truly necessary

Cedars-Sinai: Healthcare Patient Triage

Scale: 42,000+ patients processed

Pattern: Tool use + planning for clinical decision support

Results: 77% of patients received optimal treatment plans, reducing wait times and improving outcomes

Safety Considerations: Human oversight for all recommendations; agents provide decision support rather than making final medical decisions

Key Insight: In high-stakes domains like healthcare, agents augment human expertise rather than replacing it. The pattern focuses on information gathering and option generation, with humans making final decisions.

Coinbase: Web3 Agent Deployments

Scale: Production blockchain agents managing real cryptocurrency assets

Pattern: Tool use (AgentKit) + security guardrails

Implementation: Agents check balances, execute trades, deploy smart contracts — all with multi-signature approvals and transaction limits

Key Insight: When agents control real money, security isn’t optional. Every action requires validation, limits prevent catastrophic mistakes, and audit trails ensure accountability.

Palo Alto Networks: FLEXWORK

Scale: Hybrid workforce support across enterprise clients

Pattern: Multi-agent orchestration for IT support and security monitoring

Results: Autonomous handling of access requests, security alerts, and infrastructure provisioning

Security Focus: Behavioral monitoring detects anomalous agent activity, circuit breakers prevent cascade failures in security systems

Snap & Coinbase: Temporal Workflows

Scale: Production AI workflows requiring durability

Pattern: Long-running agent workflows using Temporal for state management

Implementation: Workflows can span hours or days. If infrastructure fails, Temporal resumes from checkpoints. Users submit requests, agents research asynchronously, and results are delivered when ready.

Key Insight: Traditional request-response doesn’t work for complex agent tasks. Durable workflows enable agents to work on problems over time without losing state.

Quick Reference: Production Deployment Checklist

Before deploying your agent system to production, ensure you’ve addressed:

Pattern Selection

Chosen patterns based on task complexity, not trend-following
Documented why each pattern was selected
Validated patterns against actual production requirements
Planned for pattern evolution as needs change

Reliability

Comprehensive test suites (unit, integration, regression, adversarial)
Evaluation metrics matching production distribution
Retry logic with exponential backoff
Fallback chains for graceful degradation
Circuit breakers for failing dependencies
Confidence thresholds for high-stakes actions
Human escalation paths for edge cases

Security

Zero trust architecture (least privilege access)
Input validation and sanitization
Output filtering for sensitive data
Authentication using short-lived tokens
Pre-processing guardrails against attacks
Runtime behavioral monitoring
Post-processing safety filters
Human review for high-risk actions

Observability

Distributed tracing for all agent decisions
Structured logging for queryability
Key metrics tracked (success rate, latency, cost, satisfaction)
Pre-deployment testing (regression, A/B, red teaming)
Continuous evaluation in production
Alert systems for anomalies
Dashboards for real-time monitoring

Context Management

Selective context injection strategy
Summarization for long conversations
Vector retrieval for semantic search
Truncation policies defined
Tool output filtering implemented
Graceful degradation near limits

Production Operations

Gradual rollout plan (1% → 10% → 100%)
Automatic rollback triggers
On-call runbooks for common issues
Cost monitoring and budgets
Performance SLAs defined
Incident response procedures
User feedback loops

Conclusion: From Patterns to Production

The gap between a working demo and a reliable production system isn’t about adding more agents or fancier patterns. It’s about:

Understanding your actual requirements: Don’t use multi-agent systems because they’re cool. Use them because specialization measurably improves outcomes for your specific problem.

Building observability first: You can’t debug what you can’t see. Tracing, logging, and metrics aren’t afterthoughts — they’re foundational infrastructure.

Planning for failures: Every tool will fail. Every model will hallucinate. Every API will timeout. Design for this reality rather than hoping it won’t happen.

Iterating based on data: Production teaches you things no amount of research anticipates. Monitor metrics, collect feedback, and evolve your system continuously.

Matching patterns to problems: Sequential workflows for repeatable processes. ReAct for exploration. Multi-agent for genuine specialization. Reflection for quality-critical outputs. The pattern should serve the problem, not the other way around.

The companies succeeding with AI agents in production — Bank of America handling a billion interactions, UiPath achieving 245% ROI, Cedars-Sinai improving patient outcomes — aren’t using exotic patterns. They’re using the right patterns rigorously, with production-grade infrastructure.

Start simple. A single agent with a few tools often outperforms a complex multi-agent system poorly implemented. As you learn your actual production challenges — latency bottlenecks, context overflows, specific failure modes — you’ll know exactly which patterns to introduce.

The future of AI agents isn’t about more patterns. It’s about using existing patterns better, with the discipline and infrastructure that separate impressive demos from reliable products.

Additional Resources

Frameworks and Tools:

LangChain/LangGraph: https://github.com/langchain-ai/langgraph
Microsoft AutoGen: https://microsoft.github.io/autogen
CrewAI: https://www.crewai.com
Google Agent Development Kit: https://cloud.google.com/agent-builder
Coinbase AgentKit: https://github.com/coinbase/agentkit
Temporal: https://temporal.io

Observability:

LangSmith: https://smith.langchain.com
Azure Agent Factory: https://azure.microsoft.com/blog/agent-factory
Galileo: https://www.galileo.ai

Further Reading:

Anthropic’s Context Engineering Guide: https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents
AWS Agentic AI Patterns: https://docs.aws.amazon.com/prescriptive-guidance/latest/agentic-ai-serverless
Microsoft AI Agent Design Patterns: https://learn.microsoft.com/en-us/azure/architecture/ai-ml/guide/ai-agent-design-patterns
McKinsey: Deploying Agentic AI with Safety: https://www.mckinsey.com/capabilities/risk-and-resilience/our-insights/deploying-agentic-ai-with-safety-and-security-a-playbook-for-technology-leaders

This article synthesizes patterns and practices from production deployments across industries. While frameworks and tools evolve rapidly, the underlying patterns — ReAct, tool use, planning, and multi-agent collaboration — remain stable foundations for building reliable AI agents.

Stop building demos. Learn the battle-tested patterns, anti-patterns, and infrastructure that separate 70% prototypes from 95% production systems — based on deployments handling billions of requests.

Stop building demos. Learn the battle-tested patterns, anti-patterns, and infrastructure that separate 70% prototypes from 95% production systems — based on deployments handling billions of requests.

Core Agent Patterns

1. ReAct (Reasoning and Acting) Pattern

2. Tool Use Pattern

3. Planning Pattern

4. Multi-Agent Pattern

5. Reflection Pattern

6. Handoff Orchestration Pattern

7. Sequential Workflow Pattern

8. Hierarchical (Supervisor-Workers) Pattern

Implementation Considerations

1. Single Agent, Multi-Tool

2. Deterministic Routing

3. Context Window Management

4. Reliability

5. Security

6. Observability and Testing

7. Common Pitfalls and Anti-Patterns

8. Combining Orchestration Patterns

9. Relationship to Cloud Design Patterns

10. SDK-Based Implementations

Real-World Production Examples

Bank of America: Erica

UiPath: Production Error Correction & Claims Processing

IBM Watson AIOps

Cedars-Sinai: Healthcare Patient Triage

Coinbase: Web3 Agent Deployments

Palo Alto Networks: FLEXWORK

Snap & Coinbase: Temporal Workflows

Quick Reference: Production Deployment Checklist

Pattern Selection

Reliability

Security

Observability

Context Management

Production Operations

Conclusion: From Patterns to Production

Additional Resources

Similar Posts