The AI industry has been chasing the wrong dream. We built god models—massive language models trained on everything, capable of anything, deployed to do one specific thing. We’re using a Swiss Army knife when we need a toolkit.
But here’s what we’re really learning: AI’s superpower isn’t raw capability. It’s orchestration.
The future isn’t better LLMs. It’s systems where specialized agents, APIs, data sources, and interfaces compose together like LEGO blocks. Where you can combine a document processing agent with a database lookup with a voice interface and a reasoning engine—not in monolithic code, but in declarative workflows. And here’s the kicker: you can test each piece independently while they work together seamlessly.
The Monolith Code Problem
Today’s typical AI …
The AI industry has been chasing the wrong dream. We built god models—massive language models trained on everything, capable of anything, deployed to do one specific thing. We’re using a Swiss Army knife when we need a toolkit.
But here’s what we’re really learning: AI’s superpower isn’t raw capability. It’s orchestration.
The future isn’t better LLMs. It’s systems where specialized agents, APIs, data sources, and interfaces compose together like LEGO blocks. Where you can combine a document processing agent with a database lookup with a voice interface and a reasoning engine—not in monolithic code, but in declarative workflows. And here’s the kicker: you can test each piece independently while they work together seamlessly.
The Monolith Code Problem
Today’s typical AI application is actually a hidden monolith. It looks like this:
if task == "classify":
call GPT-4 with prompt A
elif task == "extract":
call GPT-4 with prompt B
elif task == "reason":
call Claude with prompt C
elif task == "query_db":
hit database directly
elif task == "voice_input":
transcribe + call GPT-4
All mixed together. One massive script (or codebase).
When you need to change the classification logic, you risk breaking extraction. When you want to swap the voice provider, you touch extraction code. When you want to A/B test two different reasoning approaches, you’re juggling conditional logic.
This isn’t scalable. This isn’t maintainable. And most importantly, this isn’t composable.
We initially shipped code like this and ended up updating the code for every small change. This caused a lot of issues, especially when it comes to observability. Eventually we ended up building flo-ai with all the important features we needed, and open sourcing it.
Enter: Declarative Agent Workflows
Imagine instead that you could define workflows like this:
Workflow: Customer Support Pipeline
├── Voice Input Agent (transcribe customer query)
├── Sentiment Analysis Agent (specialized for emotion detection)
├── Knowledge Base Lookup (query vector DB for relevant docs)
├── Routing Agent (decide: FAQ → Response | Complex → Escalation)
├── If FAQ:
│ ├── Response Generation Agent
│ └── Text-to-Speech Output
└── If Escalation:
├── Ticket Creation (API call)
└── Notification Agent (Slack + Email)
Each piece is independent. Each piece has clear inputs and outputs. Each piece can be tested in isolation. Yet together, they form a complete, intelligent system. This is the promise of agent orchestration: workflows as composition, not monolith code.
The Three Dimensions of Composability
1. Agent Heterogeneity
Not all agents are LLMs. Your workflow might include:
- LLM Agents (reasoning, generation, planning)
- Specialized Models (vision for document analysis, audio models for voice)
- Heuristic Agents (rule-based logic, data validation)
- Microservices (existing business logic wrapped as agents)
Each plays its role. The document classifier might be a lightweight fine-tuned model, not a frontier LLM. The data validator might be simple rules. The reasoning engine might be your Claude call. Mix and match based on what actually works, not what’s capable.
2. Multi-Modal Integration
Real workflows don’t live in text-only land. They bridge modalities:
- Voice Input → Audio transcription agent → Processing → Text Output or Voice Response
- Document Upload → Vision/OCR agent → Database lookup → Structured Output
- Stream Processing → Real-time agents making decisions → API notifications → UI updates
The workflow doesn’t care about modality. It just cares about data flowing through the right sequence of agents. An agent accepts inputs (text, audio, images, structured data), processes them, and outputs results that the next agent can consume.
3. External System Integration
Your agents aren’t isolated. They orchestrate with:
- Databases (querying, upserting, transactions)
- APIs (REST, GraphQL, webhooks)
- Message Queues (async processing, event handling)
- Analytics Platforms (logging decisions, tracking metrics)
- Legacy Systems (your existing infrastructure)
The workflow engine handles the glue. Agents declare what they need; the orchestrator provides it. Want to log every decision? Declare it once in your workflow. Want to add retry logic? Declarative. Want to add rate limiting? Same story.
Practical Example: Building a Composable Support Workflow
Let me make this concrete. Here’s a support workflow that demonstrates composition, testing, and multi-modal integration:
The Workflow:
- Customer email support.
- System transcribes and analyzes sentiment
- Retrieves relevant knowledge base articles
- Routes to automated response or escalation
- If automated: generates response + revert the email
- If escalation: creates ticket + notifies agent.
- If more clarity is required: call the customer with voice call and, get the required details.
The Beauty:
Each step is an independent agent. You can:
- Swap the transcription service (Google → AssemblyAI) without touching routing or response generation
- A/B test sentiment models independently (prod model vs. new fine-tune)
- Upgrade the knowledge base (simple search → semantic + BM25 hybrid) without changing response generation
- Test the entire workflow in 100ms by mocking external APIs
- Test specific integration points (e.g., "does the ticket creation work?") in isolation
You’re not maintaining one giant script. You’re composing building blocks.
Why This Matters Now
Three things converge:
- Specialized Model Era. We’re past "one model for everything." Domain-specific models, small efficient models, and frontier reasoners coexist. Workflows let you pick the right one for each task.
- Real-World Complexity. Production systems aren’t text-in-text-out. They’re polyglot, multi-modal, integrated with databases and APIs. Workflows acknowledge this reality.
- Operational Maturity. Companies building AI systems need to test, debug, monitor, and modify them in production. Declarative, composable workflows make this possible.
The Future: From Monolith to Fabric
In a year, I believe the landscape will look like this:
- Monolithic "prompt chains" will be seen as the bad old days (like goto statements)
- Declarative workflow definition will be standard (like how we don’t hand-code HTTP servers anymore)
- Agent composition will feel as natural as importing libraries
- Workflow testing will have the same maturity as backend testing today
- Multi-modal, multi-agent systems will be the baseline, not the exception
The god model isn’t dead. But it’s no longer the center of gravity. It’s one piece in a much larger toolkit—orchestrated, tested, composed.
The real power of AI isn’t in bigger models. It’s in smarter orchestration. We built Wavefront with this goal.
Enjoyed this article? Star Wavefront on GitHub or join the conversation in the comments. I read and respond to everything.