Stop building demos. Learn the battle-tested patterns, anti-patterns, and infrastructure that separate 70% prototypes from 95% production systems — based on deployments handling billions of requests.
24 min readJust now
–
Press enter or click to view image in full size
Connecting Patterns to Production
The gap between a demo AI agent and a production-ready system is vast. I’ve seen countless teams build impressive prototypes only to struggle when deploying to real users. The difference? Understanding the right patterns, knowing when to use them, and implementing the infrastructure that keeps agents reliable, secure, and observable.
This guide walks through battle-tested patterns that companies like Bank of America, Coinbase, and UiPath use in production, along with the imple…
Stop building demos. Learn the battle-tested patterns, anti-patterns, and infrastructure that separate 70% prototypes from 95% production systems — based on deployments handling billions of requests.
24 min readJust now
–
Press enter or click to view image in full size
Connecting Patterns to Production
The gap between a demo AI agent and a production-ready system is vast. I’ve seen countless teams build impressive prototypes only to struggle when deploying to real users. The difference? Understanding the right patterns, knowing when to use them, and implementing the infrastructure that keeps agents reliable, secure, and observable.
This guide walks through battle-tested patterns that companies like Bank of America, Coinbase, and UiPath use in production, along with the implementation considerations that separate working systems from science projects.
Core Agent Patterns
1. ReAct (Reasoning and Acting) Pattern
The ReAct pattern alternates between thinking and doing in iterative cycles: Thought → Action → Observation → Thought → Action… This mirrors how humans solve problems — we don’t just execute a plan blindly; we observe results and adjust.
Each cycle consists of:
- Thought: The agent reasons about what to do next based on the current state
- Action: The agent takes a specific action (calling a tool, querying a database, making an API request)
- Observation: The agent processes the result and incorporates it into its understanding
Press enter or click to view image in full size
ReAct Pattern
Use Cases
The pattern shines in scenarios requiring dynamic decision-making:
- Customer Service Bots: Erica, Bank of America’s AI assistant, uses ReAct to handle over 3 billion interactions annually. When a customer reports suspicious charges, Erica retrieves transaction data, analyzes patterns, and determines whether to escalate the issue to fraud prevention.
- Research Assistants: An agent tasked with market research might search for industry reports, realize it needs competitor data, fetch that information, then synthesize findings — adjusting its strategy based on what it discovers.
- Debugging Tools: IBM Watson AIOps uses ReAct for IT operations, where the agent examines logs, hypothesizes about root causes, runs diagnostic commands, and refines its understanding until it identifies the issue.
When to Use
ReAct excels when:
- The optimal path isn’t clear from the start
- You need transparent reasoning for compliance or debugging
- The problem requires exploration and discovery
- Tool calls depend on previous results (sequential dependency)
When NOT to Use
Skip ReAct when:
- You have a fixed, well-defined workflow (use sequential patterns instead)
- Latency is critical, and the task is straightforward (direct tool use is faster)
- The reasoning overhead adds no value (simple CRUD operations)
- You can’t afford the token costs of verbose reasoning traces
2. Tool Use Pattern
Tool use extends LLMs beyond their training data by connecting them to external systems. Think of it as giving your agent hands to manipulate the world — APIs, databases, calculators, code executors, even physical devices.
The agent decides which tools to use, constructs appropriate inputs, interprets outputs, and chains multiple tools together to accomplish goals.
Press enter or click to view image in full size
Tool Use Pattern
Use Cases
Every production agent uses tools, but the sophistication varies:
- Web3 Agents: Coinbase’s AgentKit gives agents tools for blockchain operations — checking balances, swapping tokens, and deploying smart contracts. An agent can autonomously manage a DeFi portfolio by reading market data and executing trades.
- Healthcare Triage: Cedars-Sinai’s patient triage system uses tools to access medical records, check lab results, and query clinical guidelines. It has handled over 42,000 patients, with 77% achieving optimal treatment plans.
- Development Workflows: Agents with code execution tools can write scripts, run them, see errors, and iteratively fix issues until the code works — essentially debugging themselves.
When to Use
Tool use is essential when:
- Agents need real-time data beyond their training cutoff
- Tasks require computation (calculations, data transformations)
- You’re integrating with existing systems (CRMs, databases, APIs)
- Actions need to be auditable and reversible
When NOT to Use
Avoid tools when:
- The LLM already knows the answer (asking for the capital of France doesn’t need Wikipedia)
- Tool latency destroys the user experience
- The tool landscape is too complex (100+ tools → agent confusion)
- Security risks outweigh benefits (giving agents database write access)
3. Planning Pattern
Planning agents decompose complex tasks into manageable subtasks before execution. Rather than acting reactively, they create a roadmap — sometimes adjusting it as they go.
There are two main approaches:
- Plan-then-Execute: Create a complete plan upfront, then follow it
- Plan-and-Refine: Create an initial plan, but revise it based on observations during execution
Press enter or click to view image in full size
Planning Pattern
Use Cases
Planning becomes critical as tasks grow complex:
- Claims Processing: UiPath’s agents handle insurance claims by planning the workflow: validate claim → gather supporting documents → assess coverage → calculate payout → generate approval letter. The plan ensures nothing is missed.
- Content Production: A marketing agent planning a campaign might break it into: research target audience → brainstorm concepts → create content calendar → draft posts → schedule publication. Each step builds on the previous one.
- Supply Chain Optimization: Agents plan multi-step logistics operations, coordinating between inventory systems, transportation APIs, and demand forecasts to optimize delivery routes.
When to Use
Planning is valuable when:
- Tasks have clear milestones or phases
- Dependencies between subtasks are significant
- You want to preview the agent’s approach before execution
- Resource allocation needs optimization (minimize API calls, parallelize work)
When NOT to Use
Skip planning when:
- Tasks are simple and linear (planning overhead isn’t worth it)
- The environment is too dynamic for plans to hold (real-time adversarial scenarios)
- Flexibility matters more than structure (exploratory research)
- Planning tokens exceed execution savings
4. Multi-Agent Pattern
Instead of one generalist agent, you deploy multiple specialized agents that collaborate. Each agent has domain expertise, its own tools, and a specific role. They communicate through message passing or shared state.
Common architectures include peer-to-peer collaboration, hierarchical structures, and marketplace models where agents bid for tasks.
Press enter or click to view image in full size
Multi-Agent Pattern
Use Cases
Multi-agent systems tackle problems too complex for a single agent:
- Software Development Teams: AutoGen creates teams of agents — a planner, coder, tester, and reviewer. They iterate on code together, each bringing specialized skills. The coder writes functions, the tester finds bugs, the reviewer ensures code quality.
- Financial Analysis: One agent scrapes market data, another performs quantitative analysis, a third generates reports, and a fourth handles risk assessment. Specialization improves accuracy.
- Customer Service Orchestration: CrewAI deployments route customers through specialist agents — billing questions go to the billing agent, technical issues to the tech support agent, ensuring expertise matches the problem.
When to Use
Multi-agent patterns shine when:
- The problem spans multiple domains requiring different expertise
- You can parallelize independent subtasks for speed
- Specialization significantly improves quality (separate code generation from code review)
- You want to scale by adding agents rather than making one super-agent
When NOT to Use
Avoid multi-agent systems when:
- A single well-prompted agent can handle everything (don’t over-engineer)
- Coordination overhead exceeds the benefits
- Latency is critical (multiple agent hops add delays)
- Debugging distributed systems is too complex for your team
5. Reflection Pattern
Reflection agents critique their own outputs and iteratively improve them. After generating an initial response, the agent evaluates its quality, identifies flaws, and produces a refined version. This can repeat multiple times.
The reflection can be self-contained (the same model reflects) or use a separate critic model.
Press enter or click to view image in full size
Reflection Pattern
Use Cases
Reflection dramatically improves output quality:
- Code Generation: An agent writes code, reflects on potential bugs or inefficiencies, then rewrites it. Studies show reflection can improve code correctness by 30%+.
- Content Creation: A writing agent drafts an article, critiques its clarity and persuasiveness, then revises. This produces more polished content than single-shot generation.
- Error Correction: UiPath’s agents detect production errors, propose fixes, reflect on whether the fix addresses root causes, then implement refined solutions. This reduced resolution times from 30 minutes to near-instant.
When to Use
Reflection is powerful when:
- Output quality is paramount and you can afford the latency
- Tasks have clear quality criteria the agent can evaluate
- Iterative refinement is natural (writing, coding, analysis)
- The cost of mistakes is high (medical advice, legal documents)
When NOT to Use
Skip reflection when:
- Speed matters more than perfection (customer chat responses)
- The task is simple and unlikely to need revision
- Token costs for multiple generations are prohibitive
- Your model isn’t good at self-evaluation (can lead to degradation)
6. Handoff Orchestration Pattern
Handoff orchestration routes tasks between specialized agents dynamically. Unlike rigid routing, agents can request handoffs when they encounter situations outside their expertise. This creates organic collaboration.
Microsoft’s Semantic Kernel implements handoffs where agents explicitly signal “I can’t handle this, please transfer to Agent X.”
Press enter or click to view image in full size
Handoff Pattern
Use Cases
Handoffs enable seamless multi-domain experiences:
- Healthcare Workflows: A patient interaction might start with a scheduling agent, hand off to a clinical triage agent for symptom assessment, then to a specialist agent for treatment recommendations — each transition triggered by the complexity detected.
- Technical Support Escalation: A first-line support agent handles basic questions but hands off to a specialized troubleshooting agent when diagnostics are needed, or to a human expert for critical issues.
- Document Processing: A document intake agent classifies documents, then hands them to specialist agents — invoices go to the accounting agent, contracts to legal review, support tickets to customer service.
When to Use
Handoffs work well when:
- Agent expertise is clearly bounded and complementary
- The flow isn’t predictable in advance (depends on conversation context)
- You want graceful escalation paths (agent → specialist agent → human)
- Domain separation improves accuracy and maintainability
When NOT to Use
Avoid handoffs when:
- Routing logic is simple and deterministic (use conditional routing instead)
- Handoff latency disrupts the user experience
- Maintaining handoff protocols between agents is too complex
- A single agent with broader capabilities is more maintainable
7. Sequential Workflow Pattern
Sequential workflows execute predefined steps in order, like a pipeline. Each step completes before the next begins. This is the most deterministic pattern — you specify the exact sequence of operations.
Think of it as an assembly line for AI: input → step 1 → step 2 → … → step N → output.
Press enter or click to view image in full size
Sequential Workflow Pattern
Use Cases
Sequential patterns excel at repeatable processes:
- Data Processing Pipelines: Extract data from source → clean and normalize → enrich with external data → analyze → generate report. Each step is deterministic and must complete before the next.
- Document Generation: Gather requirements → create outline → write sections → add citations → format → export to PDF. The order is fixed.
- Compliance Workflows: UiPath agents process insurance claims sequentially: intake validation → document verification → eligibility check → risk assessment → approval decision. Each gate must pass before proceeding.
When to Use
Sequential workflows are ideal when:
- The process is well-defined and rarely changes
- Steps have strict dependencies (output of step N is input to step N+1)
- You need predictable execution times and costs
- Compliance requires documented, repeatable processes
When NOT to Use
Avoid sequential patterns when:
- The optimal path depends on runtime discoveries (use ReAct instead)
- Steps can be parallelized for speed (use parallel execution patterns)
- The workflow needs to adapt to different inputs (use conditional routing)
- Innovation requires exploration rather than repetition
8. Hierarchical (Supervisor-Workers) Pattern
A supervisor agent plans and coordinates while worker agents execute specialized tasks. The supervisor decomposes problems, assigns work, monitors progress, and synthesizes results. Workers focus on their specific domains without worrying about the big picture.
This mirrors organizational structures — a project manager coordinating developers, designers, and QA engineers.
Press enter or click to view image in full size
Hierarchical Pattern
Use Cases
Hierarchical patterns manage complexity at scale:
- Research Automation: A supervisor agent manages a research project by delegating to worker agents — one scrapes academic papers, another analyzes data, a third generates summaries. The supervisor ensures all pieces come together coherently.
- Testing Orchestration: A supervisor coordinates testing workers — unit test agent, integration test agent, performance test agent — aggregating their findings into a comprehensive test report.
- Multi-Channel Marketing: The supervisor plans a campaign while workers execute on specific channels — one handles social media, another email, a third manages ads. The supervisor ensures brand consistency and timing.
When to Use
Hierarchical patterns work when:
- Tasks naturally decompose into independent subtasks
- Central coordination prevents conflicts or redundancy
- Workers need different tools or access levels
- You want clear accountability and monitoring points
When NOT to Use
Skip hierarchical patterns when:
- The supervisor becomes a bottleneck (all work routes through it)
- Workers are interdependent and need peer communication
- The task is simple enough for a single agent
- Supervisor overhead (planning, aggregation) exceeds the benefit
Implementation Considerations
1. Single Agent, Multi-Tool
Design Principles
The most common production pattern is a single agent with access to multiple tools. The key is designing a coherent tool ecosystem:
- Tool Naming: Clear, descriptive names help the agent choose correctly (
get_customer_order_historybeatstool_47) - Tool Descriptions: Detailed descriptions of when to use each tool, what inputs are required, and what outputs to expect
- Tool Grouping: Related tools should be documented together (all database operations, all API calls)
- Error Messages: Tools should return actionable error messages the agent can understand and act on
Implementation Strategies
Start with tools that provide maximum leverage:
- Data Retrieval: Database queries, API calls to fetch information
- Actions: Write operations, triggering workflows, sending notifications
- Computation: Calculators, data transformations, code execution
- External Services: Third-party integrations (payment processors, mapping services)
Use tool chaining carefully — some agents try to call tools that call tools, leading to cascading failures.
Common Pitfalls
- Tool Overload: Giving an agent 50+ tools leads to confusion and wrong tool selection. Group tools or use routing.
- Inconsistent Interfaces: If some tools expect JSON and others expect natural language, the agent will struggle.
- Hidden Dependencies: Tool A requires data from Tool B, but the agent doesn’t know this. Document dependencies clearly.
- No Rollback: Actions without undo mechanisms are dangerous. Implement transactions or confirmations for critical operations.
2. Deterministic Routing
When Routing Matters
Not every query needs the full power of your agent system. Routing directs requests to the right handler:
- Simple FAQs → retrieval system
- Complex questions → reasoning agent
- Specific domains → specialist agents
- Ambiguous requests → triage agent
Approaches
Code-Based Routing: Use explicit conditionals — if query contains “refund”, route to billing agent. Fast, predictable, but brittle.
Embedding-Based Routing: Embed the query and compare it to labeled examples. Routes “I need my money back” to billing even without the keyword “refund”. More flexible but requires good examples.
LLM-Based Routing: Let a small, fast model classify queries. Most flexible, but adds latency and cost.
Hybrid Approach: Use code-based routing for obvious cases, fall back to embeddings or LLM routing for ambiguous queries.
Implementation Tips
- Monitor misroutes — they reveal gaps in your routing logic
- Provide fallback paths when routing is uncertain
- Consider confidence thresholds before routing (if confidence < 0.7, ask clarifying questions)
- Balance latency vs. accuracy — sometimes fast approximate routing beats slow perfect routing
Common Mistakes
- Over-Engineering: Complex ML-based routing for simple two-way splits
- Under-Engineering: Keyword matching for nuanced queries that need semantic understanding
- No Catch-All: Queries that don’t match any route fail instead of going to a default handler
- Ignoring Context: Routing based solely on the latest message ignores conversation history
3. Context Window Management
The Challenge
Context windows are finite. Even with 200K token models, production agents often hit limits when:
- Conversations span hours or days
- Tools return large payloads (entire database dumps)
- Multi-agent systems accumulate extensive histories
- You’re processing documents alongside conversation history
Strategies
Selective Injection: Only include relevant context. If discussing billing, don’t inject unrelated technical documentation.
Summarization: Periodically condense conversation history. Keep recent turns verbatim, but summarize older content.
Semantic Retrieval: Store conversation in a vector database. Retrieve only semantically relevant passages when needed.
Truncation Policies:
- Sliding window (keep last N turns)
- Importance-based (keep critical turns, drop small talk)
- Query-dependent (load context relevant to current query)
Tool Output Filtering: When a tool returns data, extract only what’s needed. Don’t inject a 10,000-row spreadsheet — summarize key metrics.
Graceful Degradation: When approaching limits, warn the user and suggest starting a new conversation. Don’t just fail.
Real-World Example
Anthropic’s guidance on context engineering recommends structuring information hierarchically — load high-level summaries first, then drill down only when needed. This keeps tokens available for reasoning rather than being consumed by context.
4. Reliability
The Production Gap
Research prototypes hit 70% accuracy. Production systems need 95%+, with failures being graceful and recoverable.
Evaluation
Build comprehensive test suites:
- Unit Tests: Individual tool calls work as expected
- Integration Tests: Multi-step workflows complete successfully
- Regression Tests: Changes don’t break existing capabilities
- Adversarial Tests: Malformed inputs, edge cases, hostile prompts
Use held-out evaluation sets that mirror production distribution. If 30% of production queries are about billing, 30% of your test set should be too.
Failure Mitigation
Retries with Backoff: Tool calls fail. Retry with exponential backoff before giving up.
Fallback Chains: Primary agent fails → simpler agent → template response → human handoff.
Circuit Breakers: If a tool fails repeatedly, stop calling it temporarily to prevent cascade failures.
Validation: Check agent outputs before executing actions. Does the generated SQL query look reasonable? Validate before running.
Quality Gates
Implement checkpoints before critical actions:
- Confidence thresholds (only act if confidence > 0.8)
- Human-in-the-loop for high-stakes decisions (wire transfers, medical treatments)
- Staged rollouts (test on 1% of traffic before full deployment)
Real-World Success
UiPath’s agents achieve 245% ROI in claims processing by combining automation with reliability safeguards. When uncertain, agents escalate to humans rather than guessing — maintaining accuracy while still handling 80% of cases autonomously.
5. Security
Zero Trust Architecture
Every agent action assumes hostile actors exist. Key principles:
Least Privilege: Agents only access data and tools required for their specific role. The customer service agent doesn’t need admin database access.
Input Validation: Treat all user input as potentially malicious. Sanitize SQL queries, validate API parameters, and escape shell commands.
Output Sanitization: Agents shouldn’t expose sensitive data (PII, API keys, internal errors) in responses.
Authentication: Agents authenticating to services should use short-lived tokens, not hardcoded credentials.
Guardrails
Implement multiple layers:
Pre-Processing: Detect and block obvious attacks (SQL injection patterns, prompt injection attempts).
Runtime Monitoring: Watch for suspicious behavior — unusual API call patterns, attempts to access restricted data, loops consuming excessive resources.
Post-Processing: Filter outputs for sensitive information before returning to users.
Human Review: Flag high-risk actions for human approval before execution.
Circuit Breakers
When anomalies are detected, automatically:
- Throttle the agent’s actions
- Require additional verification
- Disable specific tools temporarily
- Alert security teams
Palo Alto Networks’ AI systems use behavioral monitoring — agents that deviate from normal patterns trigger alerts even if no specific rule is violated.
Behavioral Monitoring
Establish baselines for normal agent behavior:
- Typical response times
- Common tool usage patterns
- Expected data access patterns
Deviations indicate potential compromises or bugs.
6. Observability and Testing
Tracing
Track every agent decision:
- Which prompt was used
- What tools were called and in what order
- How long each step took
- What context was available
- Why certain decisions were made
Tools like LangSmith and Azure Agent Factory provide detailed traces showing exactly where things went wrong.
Logging
Structure logs for queryability:
{ "timestamp": "2025-11-09T10:30:00Z", "conversation_id": "abc123", "agent_id": "customer_service_v2", "action": "tool_call", "tool_name": "get_order_status", "latency_ms": 234, "success": true, "user_query": "Where is my order?"}
Metrics
Track what matters:
- Success Rate: Percentage of conversations resolved without escalation
- Latency: P50, P95, P99 response times
- Cost: Token usage per conversation
- User Satisfaction: CSAT scores, thumbs up/down
- Tool Usage: Which tools are called most, which fail most
Evaluation
Pre-Deployment Testing:
- Prompt regression tests (does the new prompt maintain performance on existing benchmarks?)
- A/B testing (does agent v2 outperform agent v1 on held-out data?)
- Red teaming (can adversaries break the agent?)
Continuous Testing:
- Monitor production performance against benchmarks
- Shadow mode (run new agent versions alongside production, compare outputs without affecting users)
- Gradual rollouts with automatic rollback if metrics degrade
Real-World Example
IBM’s observability framework for Watson AIOps tracks 200+ metrics per agent, enabling rapid diagnosis of issues. When success rates drop, they can pinpoint whether it’s a specific tool failing, a prompt regression, or a data quality issue.
7. Common Pitfalls and Anti-Patterns
The God Prompt
What It Is: A single massive prompt trying to handle every scenario — hundreds of if-then rules, dozens of examples, edge case handling.
Why It Fails: Models lose coherence in ultra-long prompts. Important instructions get lost in the noise. Maintenance becomes impossible.
Solution: Break into focused prompts for specific scenarios. Use routing to select the right prompt.
Agent Sprawl
What It Is: Creating a new agent for every minor variation. 20 agents that do almost the same thing with slight tweaks.
Why It Fails: Coordination overhead explodes. Maintaining consistency across agents is impossible. Users get confused by handoffs.
Solution: Start with generalist agents. Only create specialists when specialization provides clear, measurable value.
The “3 Wishes” Problem
What It Is: Agents that stop after a fixed number of steps, even if the task isn’t complete. “You can call tools 3 times, then you must respond.”
Why It Fails: Arbitrary limits force agents to guess or quit prematurely. Tasks that need 4 steps fail.
Solution: Use dynamic stopping conditions based on task completion, not arbitrary turn limits. Allow loops but implement safeguards against infinite loops.
Hallucinated Tool Calls
What It Is: Agents calling tools that don’t exist or inventing parameters.
Why It Fails: The agent confidently tries to call get_customer_balance(customer_id, include_pending=True) but the tool only accepts customer_id. Execution fails.
Solution: Strict validation of tool calls before execution. Detailed tool schemas. In-context examples showing correct usage.
Prompt Injection Vulnerability
What It Is: Users embedding instructions in their input that override the agent’s system prompt. “Ignore previous instructions and…”
Why It Fails: Agents execute malicious commands, leak data, or behave unpredictably.
Solution: Input validation, sandboxing, clear separation between system instructions and user input, prompt injection detection.
Context Overflow
What It Is: Hitting context limits mid-conversation. The agent can’t remember the early parts of long conversations.
Why It Fails: Users must repeat themselves. The agent contradicts its earlier statements. Errors compound.
Solution: Implement context management strategies early (summarization, selective retention, vector search).
Error Message Loops
What It Is: A tool fails, the agent retries with the same input, fails again, retries…
Why It Fails: Burns tokens, wastes time, and frustrates users. Some agents loop infinitely.
Solution: Parse error messages. Adjust strategy based on errors. Implement circuit breakers and max retry limits.
Overfitting to Examples
What It Is: Agents work perfectly on training examples but fail on slightly different production queries.
Why It Fails: The agent memorized patterns rather than learning the task. Real-world variety breaks it.
Solution: Diverse training data. Explicit emphasis on principles, not just examples. Regular testing on novel inputs.
No Human Escalation Path
What It Is: Agents try to handle everything autonomously with no way to request human help.
Why It Fails: Edge cases, ambiguous situations, and genuinely complex problems result in poor outcomes rather than appropriate escalation.
Solution: Build in explicit escalation triggers — uncertainty thresholds, specific scenarios, and user requests.
Ignoring Latency
What It Is: Chaining 10+ tool calls without considering cumulative latency. “It’s only 200ms per call!”
Why It Fails: 10 × 200ms = 2 seconds, and that’s assuming perfect parallelization. Users abandon slow agents.
Solution: Set latency budgets. Parallelize independent calls. Cache common queries. Use streaming responses.
Tool Return Sizes
What It Is: Tools returning entire database tables or massive JSON payloads that consume the context window.
Why It Fails: No room left for reasoning. Costs explode. Slow token processing.
Solution: Implement pagination. Return summaries with drill-down capability. Filter data at the tool level, not in the prompt.
Inconsistent State Management
What It Is: In multi-agent or multi-turn scenarios, losing track of state. Agents forget what was already done.
Why It Fails: Redundant actions, contradictory decisions, user frustration (“I already told you this!”).
Solution: Explicit state management — store conversation state externally, pass it between agents, use memory systems.
8. Combining Orchestration Patterns
Hybrid Architectures
Real systems rarely use just one pattern. Production deployments combine patterns strategically:
Sequential + ReAct Hybrid
Define high-level sequential stages, but use ReAct within each stage for flexibility:
- Data Gathering (ReAct to find relevant sources)
- Analysis (ReAct to explore insights)
- Report Generation (Sequential pipeline)
Hierarchical + Tool Use
Supervisor agents with specialized tool-using workers:
- Supervisor: Planning and coordination
- Worker 1: Data retrieval tools
- Worker 2: Analysis tools
- Worker 3: Reporting tools
Multi-Agent + Handoffs
Multiple specialist agents that can hand off to each other:
- Technical support agent handles standard issues
- Billing agent handles payment questions
- Escalation agent handles complex cases. Each agent recognizes when it’s out of depth and transfers cleanly.
Example: Customer Service System
A production customer service system might use:
- Embedding-based routing to classify incoming queries
- Specialized agents for different departments (billing, technical, returns)
- ReAct pattern within each agent for dynamic problem-solving
- Handoff orchestration for escalations
- Sequential workflows for standard processes (return processing)
This isn’t overengineering — it’s matching patterns to specific needs within a complex system.
9. Relationship to Cloud Design Patterns
AI agents don’t exist in a vacuum. They integrate with cloud infrastructure, and many traditional cloud patterns apply directly:
Saga Pattern
Distributed transactions across multiple agents or services. If step 3 fails, compensating transactions undo steps 1 and 2.
Use in AI: Multi-agent workflows where agents coordinate complex operations. If one agent’s action fails, other agents need to rollback their changes.
Circuit Breaker Pattern
Detect failures and prevent cascade failures by temporarily disabling failing services.
Use in AI: When a tool (API, database, model) starts failing, stop calling it to prevent wasting resources. Degrade gracefully or use fallbacks.
Bulkhead Pattern
Isolate resources so failures in one part don’t affect others.
Use in AI: Separate resource pools for different agents or tool groups. The data-heavy analytics agent doesn’t starve the lightweight chat agent of API quotas.
Event-Driven Architecture
Components communicate through events rather than direct calls.
Use in AI: Agents publish events (“user_query_received”, “order_processed”), and other agents subscribe to relevant events. Loose coupling enables scalability.
Step Functions (State Machines)
AWS Step Functions, Azure Durable Functions, and Google Cloud Workflows orchestrate complex, multi-step processes with error handling and retries.
Use in AI: Perfect for sequential and hierarchical agent patterns. Define workflows visually, handle failures declaratively, and get automatic observability.
Temporal Workflows
Temporal provides durable execution for long-running processes. If a server crashes mid-workflow, Temporal resumes from the last checkpoint.
Use in AI: Snap and Coinbase use Temporal for AI agents. Workflows can run for hours or days (user makes a request, agent researches overnight, user gets results in the morning). Temporal handles failures, retries, and state persistence.
10. SDK-Based Implementations
LangChain / LangGraph
What It Offers: The most popular framework. LangChain provides abstractions for prompts, agents, tools, and memory. LangGraph adds stateful, graph-based orchestration.
Best For:
- ReAct agents with complex tool chains
- Conversational agents with memory
- Rapid prototyping
Considerations:
- Abstractions can feel heavy for simple use cases
- Debugging opaque workflows takes effort
- Production deployments require LangSmith for observability
Example: Building a research agent that searches the web, extracts information, and synthesizes findings.
Microsoft AutoGen
What It Offers: Multi-agent conversations. Agents have roles (user proxy, assistant, critic) and negotiate solutions through dialogue.
Best For:
- Multi-agent collaboration patterns
- Code generation with review cycles
- Research requiring diverse perspectives
Considerations:
- Conversation-based approach adds latency
- Managing multi-agent interactions is complex
- Best for scenarios where dialogue genuinely improves outcomes
Example: Software development teams where agents write code, review it, suggest improvements, and iterate until tests pass.
CrewAI
What It Offers: Role-based agents with explicit processes (sequential, hierarchical). Inspired by human teams — product manager, developer, QA.
Best For:
- Business process automation
- Content creation workflows
- Marketing campaigns
Considerations:
- Opinionated structure (agents must fit roles)
- Less flexible than LangChain for custom patterns
- Great for standard workflows, constraining for novel use cases
Example: Marketing team where agents research trends, write content, design visuals, and schedule posts.
Google Agent Development Kit (ADK)
What It Offers: Google’s framework integrated with Vertex AI. Strong support for tool use and grounding (connecting to enterprise data).
Best For:
- Google Cloud native deployments
- Enterprise integrations (Google Workspace, BigQuery)
- Grounding agents in proprietary data
Considerations:
- Tied to Google Cloud ecosystem
- Newer framework, smaller community
- Excellent if you’re already Google Cloud native
Example: Enterprise search agent grounded in company documents, emails, and databases.
Coinbase AgentKit
What It Offers: Web3-specific toolkit. Agents can interact with blockchains, manage wallets, execute trades, and deploy smart contracts.
Best For:
- DeFi applications
- Blockchain-native agents
- Crypto portfolio management
Considerations:
- Narrow domain (Web3 only)
- Security is critical (agents control real money)
- Requires blockchain expertise
Example: Autonomous DeFi agent that monitors market conditions and rebalances portfolios.
Real-World Production Examples
Bank of America: Erica
Scale: 3 billion+ interactions annually, 50 million active users
Pattern: Multi-agent system with specialized sub-agents for different banking domains
Results: Handles routine inquiries, provides spending insights, assists with transactions — freeing human agents for complex cases
UiPath: Production Error Correction & Claims Processing
Scale: Enterprise deployments across Fortune 500 companies
Pattern: Reflection + Sequential workflows for production error handling and claims processing
Results:
- Error resolution time reduced from 30 minutes to near-instant
- Claims processing achieves 245% ROI
- Agents detect issues, propose fixes, reflect on root causes, then implement refined solutions
Key Insight: Combining reflection with deterministic workflows ensures both quality and reliability. The agent explores solutions creatively but follows compliance-required steps rigidly.
IBM Watson AIOps
Scale: Managing IT operations for global enterprises
Pattern: ReAct for dynamic troubleshooting + tool use for diagnostics
Results: Autonomous incident detection and resolution, reducing mean time to resolution (MTTR) by 60%+
Implementation: Agents examine logs, hypothesize root causes, run diagnostic commands, refine understanding, and escalate only when truly necessary
Cedars-Sinai: Healthcare Patient Triage
Scale: 42,000+ patients processed
Pattern: Tool use + planning for clinical decision support
Results: 77% of patients received optimal treatment plans, reducing wait times and improving outcomes
Safety Considerations: Human oversight for all recommendations; agents provide decision support rather than making final medical decisions
Key Insight: In high-stakes domains like healthcare, agents augment human expertise rather than replacing it. The pattern focuses on information gathering and option generation, with humans making final decisions.
Coinbase: Web3 Agent Deployments
Scale: Production blockchain agents managing real cryptocurrency assets
Pattern: Tool use (AgentKit) + security guardrails
Implementation: Agents check balances, execute trades, deploy smart contracts — all with multi-signature approvals and transaction limits
Key Insight: When agents control real money, security isn’t optional. Every action requires validation, limits prevent catastrophic mistakes, and audit trails ensure accountability.
Palo Alto Networks: FLEXWORK
Scale: Hybrid workforce support across enterprise clients
Pattern: Multi-agent orchestration for IT support and security monitoring
Results: Autonomous handling of access requests, security alerts, and infrastructure provisioning
Security Focus: Behavioral monitoring detects anomalous agent activity, circuit breakers prevent cascade failures in security systems
Snap & Coinbase: Temporal Workflows
Scale: Production AI workflows requiring durability
Pattern: Long-running agent workflows using Temporal for state management
Implementation: Workflows can span hours or days. If infrastructure fails, Temporal resumes from checkpoints. Users submit requests, agents research asynchronously, and results are delivered when ready.
Key Insight: Traditional request-response doesn’t work for complex agent tasks. Durable workflows enable agents to work on problems over time without losing state.
Quick Reference: Production Deployment Checklist
Before deploying your agent system to production, ensure you’ve addressed:
Pattern Selection
- Chosen patterns based on task complexity, not trend-following
- Documented why each pattern was selected
- Validated patterns against actual production requirements
- Planned for pattern evolution as needs change
Reliability
- Comprehensive test suites (unit, integration, regression, adversarial)
- Evaluation metrics matching production distribution
- Retry logic with exponential backoff
- Fallback chains for graceful degradation
- Circuit breakers for failing dependencies
- Confidence thresholds for high-stakes actions
- Human escalation paths for edge cases
Security
- Zero trust architecture (least privilege access)
- Input validation and sanitization
- Output filtering for sensitive data
- Authentication using short-lived tokens
- Pre-processing guardrails against attacks
- Runtime behavioral monitoring
- Post-processing safety filters
- Human review for high-risk actions
Observability
- Distributed tracing for all agent decisions
- Structured logging for queryability
- Key metrics tracked (success rate, latency, cost, satisfaction)
- Pre-deployment testing (regression, A/B, red teaming)
- Continuous evaluation in production
- Alert systems for anomalies
- Dashboards for real-time monitoring
Context Management
- Selective context injection strategy
- Summarization for long conversations
- Vector retrieval for semantic search
- Truncation policies defined
- Tool output filtering implemented
- Graceful degradation near limits
Production Operations
- Gradual rollout plan (1% → 10% → 100%)
- Automatic rollback triggers
- On-call runbooks for common issues
- Cost monitoring and budgets
- Performance SLAs defined
- Incident response procedures
- User feedback loops
Conclusion: From Patterns to Production
The gap between a working demo and a reliable production system isn’t about adding more agents or fancier patterns. It’s about:
Understanding your actual requirements: Don’t use multi-agent systems because they’re cool. Use them because specialization measurably improves outcomes for your specific problem.
Building observability first: You can’t debug what you can’t see. Tracing, logging, and metrics aren’t afterthoughts — they’re foundational infrastructure.
Planning for failures: Every tool will fail. Every model will hallucinate. Every API will timeout. Design for this reality rather than hoping it won’t happen.
Iterating based on data: Production teaches you things no amount of research anticipates. Monitor metrics, collect feedback, and evolve your system continuously.
Matching patterns to problems: Sequential workflows for repeatable processes. ReAct for exploration. Multi-agent for genuine specialization. Reflection for quality-critical outputs. The pattern should serve the problem, not the other way around.
The companies succeeding with AI agents in production — Bank of America handling a billion interactions, UiPath achieving 245% ROI, Cedars-Sinai improving patient outcomes — aren’t using exotic patterns. They’re using the right patterns rigorously, with production-grade infrastructure.
Start simple. A single agent with a few tools often outperforms a complex multi-agent system poorly implemented. As you learn your actual production challenges — latency bottlenecks, context overflows, specific failure modes — you’ll know exactly which patterns to introduce.
The future of AI agents isn’t about more patterns. It’s about using existing patterns better, with the discipline and infrastructure that separate impressive demos from reliable products.
Additional Resources
Frameworks and Tools:
- LangChain/LangGraph: https://github.com/langchain-ai/langgraph
- Microsoft AutoGen: https://microsoft.github.io/autogen
- CrewAI: https://www.crewai.com
- Google Agent Development Kit: https://cloud.google.com/agent-builder
- Coinbase AgentKit: https://github.com/coinbase/agentkit
- Temporal: https://temporal.io
Observability:
- LangSmith: https://smith.langchain.com
- Azure Agent Factory: https://azure.microsoft.com/blog/agent-factory
- Galileo: https://www.galileo.ai
Further Reading:
- Anthropic’s Context Engineering Guide: https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents
- AWS Agentic AI Patterns: https://docs.aws.amazon.com/prescriptive-guidance/latest/agentic-ai-serverless
- Microsoft AI Agent Design Patterns: https://learn.microsoft.com/en-us/azure/architecture/ai-ml/guide/ai-agent-design-patterns
- McKinsey: Deploying Agentic AI with Safety: https://www.mckinsey.com/capabilities/risk-and-resilience/our-insights/deploying-agentic-ai-with-safety-and-security-a-playbook-for-technology-leaders
This article synthesizes patterns and practices from production deployments across industries. While frameworks and tools evolve rapidly, the underlying patterns — ReAct, tool use, planning, and multi-agent collaboration — remain stable foundations for building reliable AI agents.