6 min readJust now
–
Most companies are racing to add AI to their products. But here’s what keeps happening: the demo looks incredible… until someone asks a question that’s slightly off-script, and the whole thing falls apart.
The difference between “impressive demo” and “reliable production system” isn’t what most people think. It’s not about using bigger models or spending months fine-tuning.
After experimenting with different approaches, I’ve found a surprisingly simple idea that changes everything: teach the AI how to fetch what it needs before it answers.
It’s about teaching AI systems how to find information instead of expecting them to already know everything.
Let me explain.
The Real Problem: When AI Doesn’t Know What It Doesn’t Know
Imagine you’re buildi…
6 min readJust now
–
Most companies are racing to add AI to their products. But here’s what keeps happening: the demo looks incredible… until someone asks a question that’s slightly off-script, and the whole thing falls apart.
The difference between “impressive demo” and “reliable production system” isn’t what most people think. It’s not about using bigger models or spending months fine-tuning.
After experimenting with different approaches, I’ve found a surprisingly simple idea that changes everything: teach the AI how to fetch what it needs before it answers.
It’s about teaching AI systems how to find information instead of expecting them to already know everything.
Let me explain.
The Real Problem: When AI Doesn’t Know What It Doesn’t Know
Imagine you’re building an AI assistant for an e-commerce company. Someone asks:
“How many jackets did we sell last week in New York?”
A typical AI setup has three failure modes:
Mode 1: Pure hallucination The model just makes up a number. It sounds confident, but it’s completely wrong.
If the AI has only been trained on general knowledge, it has no clue. It might hallucinate a number, or dodge the question. Even when connected to real data sources, it often hallucinates table or column names — resulting in failed queries.
Press enter or click to view image in full size
AI hallucinates table/column names when querying Mode 2: Retrieval without execution
Traditional “fix” = Retrieval-Augmented Generation (RAG). You store your company data in a vector database, retrieve the most relevant chunks, and pass them to the model. You’ve set up RAG (Retrieval-Augmented Generation) to pull relevant documents. The AI retrieves a document that mentions “jacket sales” and “New York,” but the numbers are from three weeks ago. It answers with stale data, confidently.
Press enter or click to view image in full size
RAG fetches stale data instead of live answers
Mode 3: Failed tool use
You’ve connected the AI to your database. It tries to query it, but hallucinates a table name like jacket_sales_ny
that doesn’t exist. Query fails. User gets an error or a vague apology.
I’ve seen all three happen in production. They’re not edge cases — they’re the norm when you don’t set up the right architecture.
What Actually Works: Concepts + Tools + Orchestration
After building and breaking several systems, I landed on an architecture that actually holds up in production. It has three layers:
1. Learned Concepts (Not Just Documents)
Instead of dumping raw documentation or data into a vector database and hoping the AI figures it out, I store structured instructions that teach the AI how to handle specific types of questions.
Think of it like training a new employee. You don’t just hand them your company’s entire Google Drive and say “figure it out.” You give them:
- Standard operating procedures
- Decision trees for common scenarios
- Clear instructions on where to find what For our jacket sales example, the system has a learned concept that says:
To answer product sales questions:1. Check if question includes: product type, time period, location2. Query the sales_transactions table3. Filter by: product_category, date_range, store_location 4. Aggregate using SUM(quantity)5. Always verify the date range matches what was asked
This isn’t a document the AI has to interpret. It’s a structured recipe it can follow.
2. Tools That Actually Connect to Live Systems
This is where protocols like MCP (Model Context Protocol) come in. The AI doesn’t just read about your database — it can actually query it.
The key difference: these tools are explicitly available to the AI at runtime. It knows it can:
- Execute SQL queries (with proper permissions)
- Call internal APIs
- Fetch real-time data from services
- Transform and validate the results The AI becomes less like a chatbot and more like a junior analyst who can actually pull reports.
3. The Orchestrator (The Critical Layer Most People Skip)
Here’s where it all comes together. The orchestrator sits between the user’s question and the AI. Its job is to:
- Analyze the question — What type of query is this?
- Fetch relevant concepts — Pull the learned instructions for this question type
- Provide appropriate tools — Make available only the tools needed for this task
- Let the AI execute — The model uses the concept as a guide and tools to get real data
- Validate the result — Check that the answer actually makes sense Without orchestration, you’re either feeding the AI everything every time (expensive and slow) or leaving it to guess what it needs (unreliable).
Flow of AI using the learnt concepts
The Intern Analogy
I explain this to non-technical stakeholders like this:
Without this architecture: You hire an intern and immediately ask them complex questions about your business. They either make something up or say “I don’t know” to everything.
With this architecture: You hire an intern and give them:
- A handbook for different types of tasks (learned concepts)
- Access to the tools they need — database credentials, API keys, spreadsheets (tools via MCP)
- A manager who assigns them the right tasks with the right resources (orchestrator) Now when you ask about jacket sales, they:
- Check their handbook for “how to answer sales questions”
- Use their database access to run the actual query
- Return accurate, current information Which intern are you going to trust with real work?
📊 The Visual Workflow
Press enter or click to view image in full size
Caption: The orchestrator doesn’t just retrieve text. It retrieves instructions (concepts) + tools, then lets the LLM decide how to fetch the answer.
Why This Architecture Matters in Production
After running this approach in production for several months, here’s what changed:
Reliability went up Instead of hallucinating table names or mixing up data sources, the system follows explicit patterns. When it doesn’t know something, it knows that it doesn’t know.
Costs went down We’re not stuffing massive context windows with every possible document. We’re providing targeted concepts and letting the AI fetch only what it needs for each query.
Iteration speed increased Want to connect a new data source? Write a new learned concept and expose the tool. You don’t need to retrain anything or rebuild your entire RAG pipeline.
Trust from stakeholders improved When business users can see that the AI is querying real systems and following documented procedures, they actually start relying on it for decisions.
The Technical Reality
I need to be clear about something: the individual pieces here aren’t revolutionary. Function calling exists. RAG exists. Tool use exists.
What I’m describing is how to compose these pieces into an architecture that’s actually reliable enough for production use.
Most companies I’ve talked to are stuck in one of two places:
- RAG-only systems that can’t access live data and constantly serve stale information
- Tool-use systems that hallucinate function calls because they lack proper context about when and how to use those tools The orchestration layer — with learned concepts guiding tool selection and execution — is what bridges that gap.
What’s Coming Next
This is Part 1 of a three-part series where I’m documenting this architecture in detail.
Part 2 will show you the actual system design: how embeddings, learned concepts, tool definitions, and the orchestrator fit together. You’ll see the data flows, the decision points, and the failure modes I’ve had to handle.
Part 3 will cover reliability: caching strategies, handling tool failures, monitoring, and how to prevent the system from degrading over time as you add more concepts and tools.
If you’re building production AI systems and dealing with these same problems, follow along. I’m sharing what actually worked after a lot of trial and error.
This is Part 1 of a 3-part series on building production-grade AI systems:
- Part 1: The Missing Piece That Makes AI Actually Work in Production (you are here)
- Part 2: Architecture & Implementation (coming next week)
- Part 3: Making It Reliable at Scale (coming soon) If you found this useful (or exciting), hit that 👏 Clap button — and follow me for Part 2.