Claude confidently pointed me to a code path in MongoDB’s massive infrastructure codebase. I spent an hour investigating it before realizing something was off. The path worked, technically. But there were better approaches - ones the agent had completely missed.
I was implementing a new communication mode between our control plane and data plane. Enterprise scale, hundreds of thousands of lines of code, the kind of codebase where AI agents supposedly fall apart. I’d asked Claude a simple question: “How can I send this config to the data plane?”
It pointed me to a break-glass path that was meant for emergency scenarios. Technically correct, but not structurally sound, and missed a change that required 4 lines to do the same thing in a much more elegant way.
The model had everythi…
Claude confidently pointed me to a code path in MongoDB’s massive infrastructure codebase. I spent an hour investigating it before realizing something was off. The path worked, technically. But there were better approaches - ones the agent had completely missed.
I was implementing a new communication mode between our control plane and data plane. Enterprise scale, hundreds of thousands of lines of code, the kind of codebase where AI agents supposedly fall apart. I’d asked Claude a simple question: “How can I send this config to the data plane?”
It pointed me to a break-glass path that was meant for emergency scenarios. Technically correct, but not structurally sound, and missed a change that required 4 lines to do the same thing in a much more elegant way.
The model had everything it needed except the one thing that mattered: context to help it determine which approach was actually best.
The Mental Model
PROBABILITY DISTRIBUTION OF OUTCOMES
Without context engineering:
│ ╱╲
│ ╱ ╲
│ ╱ ╲___
└──────────────>
(wide spread, low mean quality)
With context engineering:
│ ╱╲
│ ╱ ╲
│ ╱ ╲
└──────────────>
(narrow spread, high mean quality)
All AI output is probabilistic. You’re sampling from a distribution of possible responses - some good, some... not quite.
The goal of context engineering is to shift that distribution. Without good context, you get a wide spread, with lower mean quality. With good context, the curve tightens and moves to the right - narrow spread, high mean quality.
There’s no way to guarantee a specific output. Instead, your objective is to increase the probability of a good-enough output.
Context Limited
Most developers using AI coding Agents (Claude Code, Cursor, Copilot, etc.) treat AI agents as if they are glorified junior engineers, or a better tab auto-complete.
Vague task, simple prompt, hit enter, pray.
What’s so powerful about these Agents is that they are Swiss Army Knives full of tools - grep, file read, bash commands, web search, examine git history. Additionally, advances in intelligence has made it possible for them to run for 10s of minutes at once without stopping.
The model capability is already there. The “intelligence” is already there.
What’s missing is the information architecture - the engineering rigor of writing code with these tools. Or, as Simon Willison calls it, Vibe Engineering.
Most agent failures are not model failures anymore, they are context failures
https://www.philschmid.de/context-engineering
In the post-AI world, the bottleneck has shifted. Writing code is easy now, however writing good code is still hard. To write good code with Agents, the #1 question vibe engineers should be asking is “how well can I curate the information environment that the Agent operates in?”
The quality of the tokens output by the agent is almost entirely dependent on the quality of what goes in. Or, garbage in, garbage out.
Most people still view AI as a black box - type something in, get something out. But you have far more control over the output than you realize.
Context engineering, as put by Shopify CEO Tobi Lutke, is
the art of providing all the context for the task to be plausibly solvable by the LLM.
In other words, the craft of deliberately architecting what and when information enters the AI’s context window, and what stays out, to maximize the probability of getting the output you actually want.
What’s Actually in the Context Window
Source: philschmid.de - Context Engineering
Every interaction uses multiple layers: system prompts, project config, tools, artifacts, and your prompt. The amateur optimizes the last layer. The professional architects all of them. (We’ll dive deep into specific techniques to optimize these layers in later parts.)
Next time you give Claude Code a task, run /context first. See what’s actually in the context window, and think about what you need, and what you don’t.
When Context Matters
Not every task needs sophisticated context engineering.
Simple tasks work with simple context - error message + file + ability to test = ability to fix a compile error, or a basic unit test. It’ll figure it out. (It’s pretty dang smart...)
Implementing a feature as part of a larger project? Now, the context engineering requires deliberate architecture, and thought about what to include, and when.
The divide between these two approaches to context is fuzzy - however, as I became more familiar with the practice of context engineering, I began to intuit when to use the low touch vs high touch frameworks.
An example from KOUCAI 口才 (AI penpals for practicing Chinese):
The Task: “The marketing website (koucai.chat) is making unnecessary API calls to the backend, slowing down page load. Please fix this.”
Naive Approach (What Could Have Happened):
Jump straight to implementation:
- Add a quick if (hostname === ‘koucai.chat’) return; check
- Gate the API call
- Ship it
This might work. But you’d miss:
- An identical pattern already exists in pages/index.jsx
- A previous commit (cf3f229) partially implemented this exact fix but incompletely
- The race condition between domain detection and initialization
- The loading state edge case that causes infinite spinners
- Test coverage requirements and existing mocking patterns
Result: Ship a “working” fix that breaks in production with infinite loading spinners, requires a hotfix, and doesn’t follow established patterns.
Context-Engineered Approach (What Actually Happened):
Phase 1: Systematic Intelligence Gathering
Instead of coding immediately, Claude dispatched parallel research agents:
- Search task history
- Discovery: pattern for CONDITIONAL_WRAPPER_RENDERING exists
- Discovery: Commit cf3f229 partially implemented this pattern
- Discovery: Pattern was applied to rendering but NOT initialization (the actual bug)
- Codebase Analysis Agent: Map existing domain detection implementations
- Discovery: Pattern exists in 3 files with proven loading state guards
- Discovery: pages/index.jsx has the complete reference implementation
- Git Archaeology Agent: Trace evolution of guest session logic
- Discovery: Recent commit added domain detection but left race condition
- Discovery: Similar race condition fixes exist in commit 744d471
- Risk Analysis Agent: Predict failure modes
- Prediction: tests may fail due to missing window.location mocks
- Result: All 15 tests passed without modification (good prediction!)
Phase 2: Mandatory Architecture Artifacts
Before writing ANY code, Claude created:
- Chain of Thought Analysis: Why does the current implementation exist? What’s the git history? What breaks easily?
- Tree of Thought: Generated 3 fundamentally different solutions:
- Solution A: Sequential useEffect with state flag (proven pattern)
- Solution B: Synchronous detection in useState initializer (SSR risk)
- Solution C: Lift to parent component (over-engineering)
- Winner: Solution A (minimal change, proven pattern, 30min estimate)
- YAGNI (you aren’t gonna need it) Declaration: Explicitly excluded:
- Centralized useIsMarketingDomain hook (future refactor)
- Performance optimization via memoization (unnecessary)
- Loading skeleton UI (overkill for instant detection)
Phase 3: Pattern-Guided Implementation
// Not guessing - copying proven pattern from pages/index.jsx lines 48-67
const [isDomainDetected, setIsDomainDetected] = useState(false);
useEffect(() => {
// Detection logic
setIsDomainDetected(true); // Signal completion
}, []);
useEffect(() => {
if (!isDomainDetected) return; // Wait for detection
if (isMarketingDomain) return; // Skip marketing
// Safe to initialize
}, [isDomainDetected, isMarketingDomain]);
Phase 4: The Context Engineering Payoff
First implementation had a bug (infinite loading spinner). But because Claude had:
- The complete pattern context from pages/index.jsx
- Understanding of the loading state architecture
- Knowledge that isInitializing starts as true
Claude immediately recognized the anti-pattern:
if (isMarketingDomain) return; // ❌ Forgot to clear isInitializing!
// Fixed!
if (isMarketingDomain) {
setIsInitializing(false); // ✅ Clear blocking state
return;
}
The Difference
| Naive Approach | Context-Engineered Approach |
|---|---|
| 10 minutes to “working” code | 35 minutes to proven solution |
| Ships with infinite spinner bug | Catches bug immediately via pattern knowledge |
| Doesn’t follow existing patterns | Reuses CONDITIONAL_WRAPPER_RENDERING |
| No learning from past mistakes | Discovered cf3f229 was incomplete |
| Reinvents the wheel | Copies proven pattern from pages/index.jsx |
| Unknown unknowns remain unknown | Parallel agents surface hidden complexity |
Takeaway: Context engineering isn’t about feeding more tokens to the model. It’s about systematically assembling the right context before attempting implementation.
The 25 minutes spent on intelligence gathering prevented:
- Reinventing a solution that already existed
- Missing the edge case that causes infinite loading
- Breaking from established codebase patterns
- Incomplete fixes like the previous commit
Using the same model, and similar prompt length, will lead to completely different outcomes. The difference is I gave the agent the information needed to make informed decisions. I architected the context.
The workflow forced Claude to ask: “What similar work exists? What patterns apply? What failed before?” before writing a single line of code. That’s context engineering.
Be intentional about every token of context you give to the model.
When Context Breaks down
Claude starts the session perfectly. It reads the spec, understands the requirements, makes a sensible plan.
100,000 tokens later, things start to break down. It’s implementing features not in the spec. It’s not writing tests, or faking results to get tests to pass, it’s explicitly not following instructions
This is context rot.
It’s easy to assume more context is always better. It’s not.
As its context length increases, a model’s ability to capture these pairwise relationships gets stretched thin, creating a natural tension between context size and attention focus. Additionally, models develop their attention patterns from training data distributions where shorter sequences are typically more common than longer ones. This means models have less experience with, and fewer specialized parameters for, context-wide dependencies.
https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents
Load your entire codebase, all specs, complete git history, and every document into the context window, and the model’s performance degrades. It can’t sustain the enormous amount of context. The signal-to-noise ratio collapses. The AI starts making spurious connections between unrelated information. It gets distracted. It hedges. Its outputs become vague and uncertain.
The 10,000 token handy MCP for using JIRA? Pretty valuable when doing research, right? (Claude is certainly better than me at JQL...). However, this is pure context rot when debugging a UI component. These tokens aren’t just wasted - they’re actively downgrading the performance of the model.
Context engineering is just as much about removal as addition.
The Taste of Curating Context
How do you actually practice context engineering? Here’s my journey:
When I first built KOUCAI 口才, I initially interacted with Claude in fairly basic ways (implement this component, fix the bug, etc.). Claude would sometimes get it right, sometimes not. I was playing roulette.
As I learned more about AI coding, I developed a more sophisticated workflow with five phases: gather intelligence, plan, execute, validate, review. The intelligence and plan phases use context engineering to pull in relevant information so that the execution phase can make the correct changes.
This is how I approach context engineering. You’ll develop your own workflow. The specific details and prompts don’t matter for now. What matters is knowing that every context decision is a judgement call by you, the human.
Which patterns matter enough to encode in your system prompt? Which tools are worth the token cost for this task? What historical context helps the agent make informed choices? When does additional context add signal versus noise?
In The New Math of Building with AI, I argued that when action becomes cheap, taste becomes king. Context engineering is how you operationalize that taste. It’s how you apply judgment when working with AI.
Context curation is taste in action.
What This means for You
Model capabilities are accelerating. Agents are getting smarter and are able to work for longer. Whatever comes next (Gemini 3, Opus 4.5, etc.), they’ll be smarter, faster, more capable, than anything that came before.
But intelligence without information is squandered.
Engineers who succeed with AI agents aren’t necessarily the ones writing clever prompts. They’re the ones who:
- Build system prompts that encode their team’s standards and patterns
- Create workflows that generate reusable, high-quality context artifacts that compound over time
- Choose MCP tools strategically based on task requirements
- Know when to add context and when to remove it for clarity
- Compress complex information into forms agents can efficiently use
These are all judgement calls and require craft to get right.
AI made action cheap. Context engineering is how you apply the judgment that determines whether that action produces something worth shipping.
Start here
Start here: Run /context in your next Claude Code session. Look at what’s loaded. For each piece of context, ask: “Is this helping or cluttering?”
This is the craft. The tooling around AI coding is still early - you’ll write CLAUDE.md files and custom slash commands because the platforms haven’t caught up yet. That’ll improve. The skill of knowing what context matters won’t.
Claude Code is waiting for your command. The question is whether you’re ready for it.
— Ben
This is Part 1 of a series on context engineering and building with AI coding agents, like Claude Code. In later parts, I will go into specifics into certain parts of the context engineering workflow (slash commands, specs/plans, parallel agents, etc.).
Appendix
These are a few articles I enjoy about context engineering. Please read them!
https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents https://www.philschmid.de/context-engineering https://github.com/humanlayer/advanced-context-engineering-for-coding-agents/blob/main/ace-fca.md