Context Engineering: The New Skill for Working with AI Agents

Claude confidently pointed me to a code path in MongoDB’s massive infrastructure codebase. I spent an hour investigating it before realizing something was off. The path worked, technically. But there were better approaches - ones the agent had completely missed.

I was implementing a new communication mode between our control plane and data plane. Enterprise scale, hundreds of thousands of lines of code, the kind of codebase where AI agents supposedly fall apart. I’d asked Claude a simple question: “How can I send this config to the data plane?”

It pointed me to a break-glass path that was meant for emergency scenarios. Technically correct, but not structurally sound, and missed a change that required 4 lines to do the same thing in a much more elegant way.

The model had everythi…

The model had everything it needed except the one thing that mattered: context to help it determine which approach was actually best.

The Mental Model

PROBABILITY DISTRIBUTION OF OUTCOMES

Without context engineering:
│   ╱╲
│  ╱  ╲
│ ╱    ╲___
└──────────────>
(wide spread, low mean quality)


With context engineering:
│     ╱╲
│    ╱  ╲
│   ╱    ╲
└──────────────>
(narrow spread, high mean quality)

All AI output is probabilistic. You’re sampling from a distribution of possible responses - some good, some... not quite.

The goal of context engineering is to shift that distribution. Without good context, you get a wide spread, with lower mean quality. With good context, the curve tightens and moves to the right - narrow spread, high mean quality.

There’s no way to guarantee a specific output. Instead, your objective is to increase the probability of a good-enough output.

Context Limited

Most developers using AI coding Agents (Claude Code, Cursor, Copilot, etc.) treat AI agents as if they are glorified junior engineers, or a better tab auto-complete.

Vague task, simple prompt, hit enter, pray.

What’s so powerful about these Agents is that they are Swiss Army Knives full of tools - grep, file read, bash commands, web search, examine git history. Additionally, advances in intelligence has made it possible for them to run for 10s of minutes at once without stopping.

The model capability is already there. The “intelligence” is already there.

What’s missing is the information architecture - the engineering rigor of writing code with these tools. Or, as Simon Willison calls it, Vibe Engineering.

Most agent failures are not model failures anymore, they are context failures

https://www.philschmid.de/context-engineering

In the post-AI world, the bottleneck has shifted. Writing code is easy now, however writing good code is still hard. To write good code with Agents, the #1 question vibe engineers should be asking is “how well can I curate the information environment that the Agent operates in?”

The quality of the tokens output by the agent is almost entirely dependent on the quality of what goes in. Or, garbage in, garbage out.

Most people still view AI as a black box - type something in, get something out. But you have far more control over the output than you realize.

Context engineering, as put by Shopify CEO Tobi Lutke, is

the art of providing all the context for the task to be plausibly solvable by the LLM.

In other words, the craft of deliberately architecting what and when information enters the AI’s context window, and what stays out, to maximize the probability of getting the output you actually want.

What’s Actually in the Context Window

Diagram showing the layers of context in an LLM interaction Source: philschmid.de - Context Engineering

Every interaction uses multiple layers: system prompts, project config, tools, artifacts, and your prompt. The amateur optimizes the last layer. The professional architects all of them. (We’ll dive deep into specific techniques to optimize these layers in later parts.)

Next time you give Claude Code a task, run /context first. See what’s actually in the context window, and think about what you need, and what you don’t.

When Context Matters

Not every task needs sophisticated context engineering.

Simple tasks work with simple context - error message + file + ability to test = ability to fix a compile error, or a basic unit test. It’ll figure it out. (It’s pretty dang smart...)

Implementing a feature as part of a larger project? Now, the context engineering requires deliberate architecture, and thought about what to include, and when.

The divide between these two approaches to context is fuzzy - however, as I became more familiar with the practice of context engineering, I began to intuit when to use the low touch vs high touch frameworks.

An example from KOUCAI 口才 (AI penpals for practicing Chinese):

The Task: “The marketing website (koucai.chat) is making unnecessary API calls to the backend, slowing down page load. Please fix this.”

Naive Approach (What Could Have Happened):

Jump straight to implementation:

Add a quick if (hostname === ‘koucai.chat’) return; check
Gate the API call
Ship it

This might work. But you’d miss:

An identical pattern already exists in pages/index.jsx
A previous commit (cf3f229) partially implemented this exact fix but incompletely
The race condition between domain detection and initialization
The loading state edge case that causes infinite spinners
Test coverage requirements and existing mocking patterns

Result: Ship a “working” fix that breaks in production with infinite loading spinners, requires a hotfix, and doesn’t follow established patterns.

Context-Engineered Approach (What Actually Happened):

Phase 1: Systematic Intelligence Gathering

Instead of coding immediately, Claude dispatched parallel research agents:

Search task history

Discovery: pattern for CONDITIONAL_WRAPPER_RENDERING exists
Discovery: Commit cf3f229 partially implemented this pattern
Discovery: Pattern was applied to rendering but NOT initialization (the actual bug)

Codebase Analysis Agent: Map existing domain detection implementations

Discovery: Pattern exists in 3 files with proven loading state guards
Discovery: pages/index.jsx has the complete reference implementation

Git Archaeology Agent: Trace evolution of guest session logic

Discovery: Recent commit added domain detection but left race condition
Discovery: Similar race condition fixes exist in commit 744d471

Risk Analysis Agent: Predict failure modes

Prediction: tests may fail due to missing window.location mocks
Result: All 15 tests passed without modification (good prediction!)

Phase 2: Mandatory Architecture Artifacts

Before writing ANY code, Claude created:

Chain of Thought Analysis: Why does the current implementation exist? What’s the git history? What breaks easily?
Tree of Thought: Generated 3 fundamentally different solutions:

Solution A: Sequential useEffect with state flag (proven pattern)
Solution B: Synchronous detection in useState initializer (SSR risk)
Solution C: Lift to parent component (over-engineering)
Winner: Solution A (minimal change, proven pattern, 30min estimate)

YAGNI (you aren’t gonna need it) Declaration: Explicitly excluded:

Centralized useIsMarketingDomain hook (future refactor)
Performance optimization via memoization (unnecessary)
Loading skeleton UI (overkill for instant detection)

Phase 3: Pattern-Guided Implementation

// Not guessing - copying proven pattern from pages/index.jsx lines 48-67
const [isDomainDetected, setIsDomainDetected] = useState(false);

useEffect(() => {
// Detection logic
setIsDomainDetected(true); // Signal completion
}, []);

useEffect(() => {
if (!isDomainDetected) return; // Wait for detection
if (isMarketingDomain) return; // Skip marketing
// Safe to initialize
}, [isDomainDetected, isMarketingDomain]);

Phase 4: The Context Engineering Payoff

First implementation had a bug (infinite loading spinner). But because Claude had:

The complete pattern context from pages/index.jsx
Understanding of the loading state architecture
Knowledge that isInitializing starts as true

Claude immediately recognized the anti-pattern:

if (isMarketingDomain) return; // ❌ Forgot to clear isInitializing!

// Fixed!
if (isMarketingDomain) {
setIsInitializing(false); // ✅ Clear blocking state
return;
}

The Difference

Naive Approach	Context-Engineered Approach
10 minutes to “working” code	35 minutes to proven solution
Ships with infinite spinner bug	Catches bug immediately via pattern knowledge
Doesn’t follow existing patterns	Reuses CONDITIONAL_WRAPPER_RENDERING
No learning from past mistakes	Discovered cf3f229 was incomplete
Reinvents the wheel	Copies proven pattern from pages/index.jsx
Unknown unknowns remain unknown	Parallel agents surface hidden complexity

Takeaway: Context engineering isn’t about feeding more tokens to the model. It’s about systematically assembling the right context before attempting implementation.

The 25 minutes spent on intelligence gathering prevented:

Reinventing a solution that already existed
Missing the edge case that causes infinite loading
Breaking from established codebase patterns
Incomplete fixes like the previous commit

Using the same model, and similar prompt length, will lead to completely different outcomes. The difference is I gave the agent the information needed to make informed decisions. I architected the context.

The workflow forced Claude to ask: “What similar work exists? What patterns apply? What failed before?” before writing a single line of code. That’s context engineering.

Be intentional about every token of context you give to the model.

When Context Breaks down

Claude starts the session perfectly. It reads the spec, understands the requirements, makes a sensible plan.

100,000 tokens later, things start to break down. It’s implementing features not in the spec. It’s not writing tests, or faking results to get tests to pass, it’s explicitly not following instructions

This is context rot.

It’s easy to assume more context is always better. It’s not.

As its context length increases, a model’s ability to capture these pairwise relationships gets stretched thin, creating a natural tension between context size and attention focus. Additionally, models develop their attention patterns from training data distributions where shorter sequences are typically more common than longer ones. This means models have less experience with, and fewer specialized parameters for, context-wide dependencies.

https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents

Load your entire codebase, all specs, complete git history, and every document into the context window, and the model’s performance degrades. It can’t sustain the enormous amount of context. The signal-to-noise ratio collapses. The AI starts making spurious connections between unrelated information. It gets distracted. It hedges. Its outputs become vague and uncertain.

The 10,000 token handy MCP for using JIRA? Pretty valuable when doing research, right? (Claude is certainly better than me at JQL...). However, this is pure context rot when debugging a UI component. These tokens aren’t just wasted - they’re actively downgrading the performance of the model.

Context engineering is just as much about removal as addition.

The Taste of Curating Context

How do you actually practice context engineering? Here’s my journey:

When I first built KOUCAI 口才, I initially interacted with Claude in fairly basic ways (implement this component, fix the bug, etc.). Claude would sometimes get it right, sometimes not. I was playing roulette.

As I learned more about AI coding, I developed a more sophisticated workflow with five phases: gather intelligence, plan, execute, validate, review. The intelligence and plan phases use context engineering to pull in relevant information so that the execution phase can make the correct changes.

This is how I approach context engineering. You’ll develop your own workflow. The specific details and prompts don’t matter for now. What matters is knowing that every context decision is a judgement call by you, the human.

Which patterns matter enough to encode in your system prompt? Which tools are worth the token cost for this task? What historical context helps the agent make informed choices? When does additional context add signal versus noise?

In The New Math of Building with AI, I argued that when action becomes cheap, taste becomes king. Context engineering is how you operationalize that taste. It’s how you apply judgment when working with AI.

Context curation is taste in action.

What This means for You

Model capabilities are accelerating. Agents are getting smarter and are able to work for longer. Whatever comes next (Gemini 3, Opus 4.5, etc.), they’ll be smarter, faster, more capable, than anything that came before.

But intelligence without information is squandered.

Engineers who succeed with AI agents aren’t necessarily the ones writing clever prompts. They’re the ones who:

Build system prompts that encode their team’s standards and patterns
Create workflows that generate reusable, high-quality context artifacts that compound over time
Choose MCP tools strategically based on task requirements
Know when to add context and when to remove it for clarity
Compress complex information into forms agents can efficiently use

These are all judgement calls and require craft to get right.

AI made action cheap. Context engineering is how you apply the judgment that determines whether that action produces something worth shipping.

Start here

Start here: Run /context in your next Claude Code session. Look at what’s loaded. For each piece of context, ask: “Is this helping or cluttering?”

This is the craft. The tooling around AI coding is still early - you’ll write CLAUDE.md files and custom slash commands because the platforms haven’t caught up yet. That’ll improve. The skill of knowing what context matters won’t.

Claude Code is waiting for your command. The question is whether you’re ready for it.

— Ben

This is Part 1 of a series on context engineering and building with AI coding agents, like Claude Code. In later parts, I will go into specifics into certain parts of the context engineering workflow (slash commands, specs/plans, parallel agents, etc.).

Appendix

These are a few articles I enjoy about context engineering. Please read them!

https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents https://www.philschmid.de/context-engineering https://github.com/humanlayer/advanced-context-engineering-for-coding-agents/blob/main/ace-fca.md