Build Efficient MCP Servers: Three Design Principles

Recently I had an idea: what would it be like to interact with my YNAB budget via Claude Code using natural language? I wanted to be able to ask questions like “How much did I spend on groceries last month?” or “What categories am I overspending in?” and get accurate answers without digging through the app.

I found some existing YNAB MCPs, but most were inactive with limited features. This seemed like a good opportunity to learn MCP design from scratch. What followed was a deep dive into context efficiency that changed how I think about building AI tools.

Understanding Model Context Protocols (MCPs)

The Model Context Protocol (MCP) is a standardized way to extend language models with external capabilities. Unlike traditional APIs where you write code to call endpoints, MCP s…

Understanding Model Context Protocols (MCPs)

The Model Context Protocol (MCP) is a standardized way to extend language models with external capabilities. Unlike traditional APIs where you write code to call endpoints, MCP servers allow AI models to discover and use tools autonomously. The protocol defines how models can:

Call tools - Execute functions that interact with external systems
Read resources - Access files, databases, or other data sources
Receive prompts - Get specialized instructions for specific tasks

When you build an MCP server, you’re essentially creating a set of capabilities that any MCP-compatible AI assistant (like Claude) can use. The model sees your tool descriptions, understands what they do, and calls them as needed to fulfill user requests.

Understanding Context Windows

Before we dive into how to make our MCPs more efficient, it’s important to understand what we’re trying to optimize. When working with LLMs, the context window is the amount of content that the model can “pay attention” to at one time. Each model has a limit to the size of its context window. For example, Claude Sonnet 4.5’s context window is about 200,000 tokens.

What is a Token?

When you send text to an LLM, it doesn’t process words one at a time. Instead, text is broken into tokens—the fundamental units that language models read and generate. A token typically represents 3-4 characters, or roughly 0.75 words in English.

For API responses and JSON data (which is what MCPs work with), tokenization looks like this:

{"name": "Checking"}           // ~7 tokens
"transfer_payee_id"            // ~5 tokens
{"balance": 125000}            // ~6 tokens

The tokenizer breaks JSON into chunks: brackets, keys, values, and punctuation all consume tokens. Every field name has a cost. Field names like "debt_escrow_amounts" and "direct_import_in_error" cost ~4-6 tokens each. When you return an API response with 18 fields per object, you’re paying the token cost for every field name, every time—even when the model doesn’t need them.

Why Token Efficiency Matters

Given the limited size of the context window, it’s critical to consider how much you’re placing inside it. Models work best when they have good context, but there’s a balance to strike:

Context limits are hard boundaries: Claude Sonnet 4.5’s 200k token limit sounds generous until you realize a naive MCP returning a year of transactions can consume 746,800 tokens—nearly 4x the entire context window. Your tool call would fail before the model could even process it.

Real sessions are already crowded: In typical Claude Code sessions, MCP tool definitions, system prompts, and memory can consume 50-60% of the context window before you’ve had a single conversation. Every inefficient tool response eats into precious space needed for reasoning and multi-turn conversations.

Noise degrades performance: Providing superfluous data doesn’t just waste tokens—it forces the model to parse irrelevant fields, increasing the chance of errors or confusion. A focused 262-token summary outperforms a noisy 4,890-token dump of raw data.

The goal isn’t to minimize tokens at all costs. It’s to give the model exactly what it needs, nothing more, nothing less. Let’s look at how this plays out in practice.

The Naive Approach: Direct API Wrapping

When I started building the YNAB MCP, I didn’t yet appreciate how much token efficiency would matter. I started with what seemed obvious: create a thin wrapper around the existing YNAB Python SDK. Each MCP tool would correspond to one API endpoint, passing through the full response. This is a common pattern I’ve seen in many MCP implementations.

# Naive approach: Just wrap the SDK
@mcp.tool()
async def get_budget(budget_id: str) -> str:
response = ynab_client.budgets.get_budget(budget_id)
return json.dumps(response.data.budget)  # Return everything

This approach works, technically. The model gets access to the data. But there’s a critical problem: API responses are designed for applications, not for AI context windows.

Traditional applications can process, filter, and cache data efficiently, so APIs return comprehensive data structures optimized for completeness, not token efficiency. A single endpoint might return thousands of fields because the API designers don’t know which specific fields your application needs.

AI models work differently. Every byte consumes precious context window space—space you could use for reasoning, conversation history, or additional tool calls. When you blindly pass through full API responses, you’re asking the model to pay the “context tax” for data it might not even need. Worse, the model has to analyze and determine which parts of that data are actually relevant—a cognitive load that can lead to errors or missed information.

To put this in perspective: tool results compete with everything else for context space. In a real Claude Code session (visible via /context), I saw the context window at 118k/200k tokens (59%)—before I’d even started a conversation. MCP tool definitions alone consumed 47.9k tokens (24%), system tools used 17.3k tokens (9%), custom agents took 2.4k tokens, and memory files added another 2.3k tokens. That’s 59% of the context window used just by the environment.

A naive MCP that returns 30k tokens for a budget overview would push that to 74% in a single tool call—leaving just 52k tokens for the actual conversation, reasoning, and additional tool calls. Every inefficient tool response eats into the space you need for multi-turn conversations.

This changed how I approached the design: MCPs need to be context-aware intermediaries, not transparent proxies. The question shifted from “How do I expose this API to the model?” to “What does the model actually need to help the user?”

Three Design Principles for Context-Efficient MCPs

One of the first things I wanted to do was check my budget overview - see my accounts, categories, and how I’m tracking for the current month. A straightforward use case that any budgeting tool should support.

My initial thought was to create tools that directly wrapped the YNAB API endpoints. Let’s take the accounts endpoint as an example. Here’s what the API returns:

{
"data": {
"accounts": [
{
"id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"name": "Checking Account",
"type": "checking",
"on_budget": true,
"closed": false,
"note": "Primary checking",
"balance": 125000,
"cleared_balance": 120000,
"uncleared_balance": 5000,
"transfer_payee_id": "...",
"direct_import_linked": true,
"direct_import_in_error": false,
"last_reconciled_at": "2025-11-05T18:27:20.140Z",
"debt_original_balance": 0,
"debt_interest_rates": {},
"debt_minimum_payments": {},
"debt_escrow_amounts": {},
"deleted": false
}
// ... 46 more accounts
]
}
}

For my 47 accounts, this API response contains 18 fields per account. Many of these fields are irrelevant for typical budget questions:

Debt interest rates and minimum payments (only relevant for debt accounts)
Direct import status (internal system state)
Cleared vs uncleared balance breakdown (too granular for overview)
Transfer payee IDs (internal references)
Last reconciliation date (accounting detail)

When you’re answering budget questions, these internal bookkeeping details just create noise.

A naive wrapper would return all 18 fields × 47 accounts = 9,960 tokens.

But here’s what I actually need to answer questions like “What’s my checking account balance?” or “How much do I have across all accounts?”:

Account name
Account type
Balance
Whether it’s on-budget
Whether it’s closed

That’s it. Just 6 fields. Here’s the filtered implementation:

For accounts (47 accounts):

# src/ynab_mcp/ynab_client.py
async def get_accounts(self, budget_id: str) -> list[dict[str, Any]]:
"""Get all accounts for a budget."""
response = self.client.accounts.get_accounts(budget_id)
accounts = []

for account in response.data.accounts:
# Skip deleted accounts entirely
if account.deleted:
continue

# Return only the fields the model actually needs
accounts.append({
"id": account.id,
"name": account.name,
"type": account.type,
"on_budget": account.on_budget,
"closed": account.closed,
"balance": account.balance / 1000 if account.balance else 0,
})

return accounts

Filtered approach (6 essential fields): 3,451 tokens for 47 accounts

Reduction: 65.4% by removing 12 unnecessary fields that the model doesn’t need for typical budget questions.

The Full Budget Overview

Of course, checking accounts is just one part of viewing your budget. A complete budget overview requires three tool calls:

Naive approach (no filtering):

Accounts (all fields): 9,960 tokens
Categories (all, including hidden): 12,445 tokens
Monthly summary (estimated): ~8,000 tokens
Total: ~30,405 tokens

Context-efficient approach:

Accounts (filtered): 3,451 tokens
Categories (visible only): 8,620 tokens
Monthly budget summary: 6,808 tokens
Total: ~18,879 tokens

Workflow reduction: 38% fewer tokens for the same functionality - a complete picture of my budget for the current month. This leaves plenty of room in Claude’s context window for conversation history, reasoning, and additional tool calls.

But it’s not just about saving tokens. By filtering out unnecessary data, I’m also improving model accuracy. When Claude doesn’t see fields like debt_escrow_amounts or direct_import_in_error, it can’t get confused by them or incorrectly incorporate them into calculations. The model focuses on exactly what matters: account names, balances, and budget status.

The key insight: the model doesn’t need to see all the data to work with it effectively. In fact, it works better with less data. By doing the filtering in the tool layer, I kept the context window lean while maintaining full functionality and improving reliability.

The filtering techniques I’d learned from accounts and categories immediately paid off when I tackled the next challenge: helping Claude categorize uncategorized transactions.

Categorizing Transactions

One of the workflows I wanted help with was taking uncategorized transactions and suggesting categories for them. This would help me ensure that my budget was accurate and up-to-date. To do this I created a tool that would fetch all uncategorized transactions from YNAB and get a list of the categories available in the budget.

On the first pass I noticed something unusual. Claude was recommending categories that were hidden in my budget - old categories I no longer used but hadn’t deleted. I quickly realized that the YNAB REST API doesn’t provide a way to exclude hidden categories in the API call itself. That meant I had two options:

Include instructions in my tool description telling Claude to ignore hidden categories
Filter them out in the tool code before returning data to Claude

I chose option 2. Here’s why: every instruction you add to a tool description consumes context window. More importantly, it puts the burden of filtering on the model. This means we’re knowingly giving the model more data than it needs, forcing it to do more work and potentially allowing the model to make mistakes.

It’s worth emphasizing: tool descriptions themselves consume context tokens. In my Claude Code session, MCP tool definitions consumed 47.9k tokens (24% of the context window) before any tools were even called. Every line of documentation, every parameter description, every usage instruction adds up. This creates a tension: you want clear, helpful descriptions, but verbose documentation eats into the space available for actual tool results and conversation.

The solution isn’t to write minimal descriptions—clarity matters. Instead, keep descriptions focused on what the tool does and its parameters, and handle behavior rules (like “ignore hidden items”) in your implementation code rather than in lengthy instructions. Instead, I implemented filtering at the tool layer:

def _filter_categories(
self, categories: list[dict[str, Any]], include_hidden: bool = False
) -> list[dict[str, Any]]:
"""Filter categories to exclude hidden/deleted ones by default."""
filtered = []
for category in categories:
# Always skip deleted categories
if category.get("deleted"):
continue
# Skip hidden categories unless explicitly included
if not include_hidden and category.get("hidden"):
continue
filtered.append(category)
return filtered

Then I exposed this as a parameter in the tool:

@mcp.tool()
async def get_categories(budget_id: str, include_hidden: bool = False) -> str:
"""Get all categories for a budget.

Args:
budget_id: The ID of the budget (use 'last-used' for default budget)
include_hidden: Include hidden categories and groups (default: False)

Returns:
JSON string with category groups and categories
"""

This approach gave me the best of both worlds: by default, Claude only sees active categories, but I can still access hidden categories when needed (like when I wanted to identify old balances that needed cleanup). This saved 30.7% of tokens per category list request (from 12,445 to 8,620 tokens by filtering out 69 hidden categories) while improving accuracy.

Design Principle #1: Filter at the Source

Do data filtering in your tool code rather than relying on prompt instructions. This saves tokens and prevents errors.

Historical Spending Analysis

Next, I wanted to be able to ask questions about my historical spending. Questions like “How much did I spend on groceries last month?” or “What categories am I overspending in?” would be really useful.

My first instinct was to create a tool that fetched all transactions for a date range and let Claude analyze them. But I quickly realized this approach had serious problems.

To illustrate, let me show you the real numbers for my 2024 transactions:

Total transactions in 2024: 3,456
Fields per transaction: 14 (id, date, amount, memo, account_name, payee_name, category_name, cleared, approved, etc.)
Average per transaction: ~216 tokens

Now extrapolate this to common queries:

1 month of transactions (~284 txns): ~61,368 tokens
3 months of transactions (~852 txns): ~184,106 tokens
6 months of transactions (~1,704 txns): ~368,213 tokens
1 year of transactions (3,456 txns): ~746,800 tokens

The problems with this approach:

Token usage: A full year query would consume 746,800 tokens - that’s 3.7x larger than Claude Sonnet 4.5’s entire 200k context window! You literally couldn’t fit a year of transactions in a single request.
Speed: Transferring and parsing thousands of transaction objects is slow
Analysis burden: Claude would need to group, sum, and calculate averages on raw data
Wasted context: Most of those 14 fields per transaction aren’t relevant to “how much did I spend?”

Even a modest 3-month query would consume 184k tokens - using 92% of the available context window just for raw transaction data. This leaves almost no room for Claude to maintain conversation history, reason about the results, or make additional tool calls to answer follow-up questions.

Instead, I realized these calculations could easily be handled in the tool layer. Here’s what the aggregation logic looks like:

async def get_category_spending_summary(
self,
budget_id: str,
category_id: str,
since_date: str,
until_date: str,
include_graph: bool = True,
) -> dict[str, Any]:
"""Get spending summary for a category over a date range."""

# Fetch transactions from API
result = await self._make_request_with_retry("get", url, params=params)
txn_data = result["data"]["transactions"]

# Aggregate in tool layer
total_spent = 0
transaction_count = 0
monthly_totals = {}

for txn in txn_data:
# Filter by category and date range
if txn.get("category_id") != category_id:
continue
if txn["date"] > until_date:
continue

# YNAB stores amounts in milliunits (e.g., $125.00 = 125000)
amount = txn["amount"] / 1000 if txn.get("amount") else 0
total_spent += amount
transaction_count += 1

# Build monthly breakdown
month_key = txn["date"][:7]  # YYYY-MM
if month_key not in monthly_totals:
monthly_totals[month_key] = 0
monthly_totals[month_key] += amount

# Calculate average per month
num_months = len(monthly_totals) if monthly_totals else 1
average_per_month = total_spent / num_months if num_months > 0 else 0

# Return only the summary
return {
"category_id": category_id,
"date_range": {"start": since_date, "end": until_date},
"total_spent": total_spent,
"transaction_count": transaction_count,
"average_per_month": average_per_month,
"monthly_breakdown": [
{"month": month, "spent": amount}
for month, amount in sorted(monthly_totals.items())
],
}

The impact was dramatic. Let me show you a real example from my implementation:

Scenario: Analyze 6 months of spending for a single category (22 transactions)

Before (returning raw transactions): 4,890 tokens
After (pre-aggregated summary): 262 tokens
Reduction: 94.6%

The aggregated response includes:

Total spent
Average per month
Transaction count
Monthly breakdown (array of {month, amount} objects)

That’s everything Claude needs to answer questions like “Am I spending more on groceries this year than last year?” without having to receive, parse, and aggregate dozens of individual transaction records.

Design Principle #2: Pre-Aggregate Data

Pre-calculate aggregations, summaries, and statistics in your tool code. Return insights, not raw data. This keeps your context window lean while still giving the model everything it needs to help users.

While filtering and aggregation solved the token efficiency problem, I ran into a different challenge: the API itself had limitations.

Building Tools for Unsupported Actions

While working on the workflow to have Claude help me categorize transactions, I realized I needed a way to split a transaction across multiple categories. For example, a Costco purchase might include $150 of groceries, $50 of household items, and $30 of gas.

Unfortunately, the YNAB API does not provide a way to convert an existing transaction into a split transaction. The API only allows creating NEW transactions with splits. This was a real limitation - but it presented an opportunity to think creatively about tool design.

Instead of telling users “sorry, the API doesn’t support this,” I created a tool that works within the API’s constraints:

@mcp.tool()
async def prepare_split_for_matching(
budget_id: str,
transaction_id: str,
subtransactions: str,
) -> str:
"""Prepare a split transaction to match with an existing imported transaction.

This tool fetches an existing transaction's details and creates a new UNAPPROVED split
transaction with the same date, amount, account, and payee. You can then manually match
them together in the YNAB web or mobile UI.

Workflow:
1. This tool fetches the existing transaction details
2. Creates a new unapproved split transaction with those details
3. You manually match them in the YNAB UI
4. YNAB merges them into one split transaction

Note:
- The new split is created as UNAPPROVED for manual matching
- The sum of subtransaction amounts should equal the original transaction amount
"""

The implementation:

async def prepare_split_for_matching(
self,
budget_id: str,
transaction_id: str,
subtransactions: list[dict[str, Any]],
) -> dict[str, Any]:
# Fetch the original transaction details
original = await self.get_transaction(budget_id, transaction_id)

# Create a new split transaction with the same details but unapproved
new_split = await self.create_split_transaction(
budget_id=budget_id,
account_id=original["account_id"],
date=original["date"],
amount=original["amount"],
subtransactions=subtransactions,
payee_name=original.get("payee_name"),
memo=original.get("memo"),
cleared=original.get("cleared", "uncleared"),
approved=False,  # Key: create as unapproved for matching
)

return {
"original_transaction": original,
"new_split_transaction": new_split,
"instructions": (
"A new unapproved split transaction has been created. "
"Go to YNAB and manually match these two transactions together. "
"Look for the match indicator in the YNAB UI."
),
}

This solution works because YNAB has a built-in “matching” feature where it can merge a manually-entered transaction with an imported one. By creating the split as unapproved, YNAB’s UI will detect the duplicate and offer to match them. When you accept the match, the imported transaction becomes a proper split transaction.

Is this ideal? No - I’d prefer a direct API endpoint. But it’s a pragmatic solution that works within the constraints of the underlying platform while still providing value to the user.

Design Principle #3: Work Within API Constraints

When an API doesn’t support something directly, look for workflows that combine available operations to achieve the desired outcome.

Real-World Token Reduction Results

Here’s a summary of the optimizations applied across the YNAB MCP, with real measured token counts:

Tool/Workflow	Naive Approach	Optimized	Reduction	Technique Applied
Accounts	9,960 tokens (18 fields)	3,451 tokens (6 fields)	65.4%	Field filtering
Categories	12,445 tokens (all)	8,620 tokens (visible)	30.7%	Default filtering + opt-in
Budget Overview	~30,405 tokens	~18,879 tokens	38%	Combined filtering
Category Spending (6mo)	4,890 tokens (raw txns)	262 tokens (summary)	94.6%	Pre-aggregation
Year of Transactions	746,800 tokens	262 tokens	99.96%	Pre-aggregation

When to Apply Each Technique

Field Filtering (accounts, categories)

Use when: API returns many fields, but only subset is needed for common queries
Savings: Moderate (30-65%)
Complexity: Low - simple field selection
Example: Remove debt details from non-debt accounts, skip internal IDs like transfer_payee_id, drop reconciliation timestamps

# Instead of returning all 18 fields, return only what matters
return {
"id": account.id,
"name": account.name,
"balance": account.balance / 1000,
"on_budget": account.on_budget
}

Default Filtering with Parameters (hidden categories)

Use when: Some data is rarely needed but occasionally useful
Savings: Moderate (30-40%)
Complexity: Low - add optional boolean parameter
Example: Hide deleted/archived items by default, expose via include_deleted flag

Pre-aggregation (spending analysis)

Use when: Model would need to compute summaries from raw data
Savings: High (90-99%)
Complexity: Medium - requires aggregation logic
Example: Return monthly totals instead of individual transactions

Creative Workarounds (split transactions)

Use when: API doesn’t support desired operation directly
Savings: Enables new functionality (not about tokens)
Complexity: High - requires understanding API constraints
Example: Multi-step workflows that achieve goals indirectly

How to Measure Your Own MCP

You might be wondering: how did I get these specific numbers? Here’s my methodology—and how you can apply it to your own MCPs.

All the numbers in this post came from real measurements. Here’s how I validated the optimizations, and how you can do the same for your MCP:

1. Set Up Token Counting

Install tiktoken, OpenAI’s tokenizer library (Claude uses a similar tokenization scheme):

pip install tiktoken

Create a helper function to count tokens:

import tiktoken

def count_tokens(text: str) -> int:
"""Count tokens using tiktoken's cl100k_base encoding."""
encoding = tiktoken.get_encoding("cl100k_base")
return len(encoding.encode(text))

2. Measure API Responses

Create a script that fetches data both ways and compares:

import json

# Get raw API response
raw_response = api.get_accounts(budget_id)
raw_json = json.dumps(raw_response, indent=2)
raw_tokens = count_tokens(raw_json)

# Get your filtered response
filtered_response = your_mcp_tool.get_accounts(budget_id)
filtered_json = json.dumps(filtered_response, indent=2)
filtered_tokens = count_tokens(filtered_json)

# Compare
reduction = ((raw_tokens - filtered_tokens) / raw_tokens) * 100
print(f"Raw: {raw_tokens:,} tokens")
print(f"Filtered: {filtered_tokens:,} tokens")
print(f"Reduction: {reduction:.1f}%")

3. Test Real Workflows

Don’t just measure individual tools - measure complete workflows users will perform:

# Simulate a budget overview workflow
accounts = your_mcp.get_accounts(budget_id)
categories = your_mcp.get_categories(budget_id, include_hidden=False)
summary = your_mcp.get_budget_summary(budget_id, current_month)

total_tokens = (
count_tokens(json.dumps(accounts)) +
count_tokens(json.dumps(categories)) +
count_tokens(json.dumps(summary))
)

print(f"Budget overview workflow: {total_tokens:,} tokens")

This revealed that my budget overview workflow uses ~19k tokens - well within Claude’s 200k context window with room to spare.

4. Watch for Runtime Warnings

Claude Code will warn you when tool responses exceed ~10k tokens. If you see these warnings frequently, it’s a signal to investigate:

⚠️ Large MCP response (~12.5k tokens), this can fill up context quickly.

These warnings helped me identify which tools needed optimization.

5. Validate Correctness

Token reduction means nothing if your tools return incorrect data. Always verify:

Does the filtered data answer the user’s questions?
Are calculations accurate? (spot-check aggregations against raw data)
Does pagination work correctly? (ensure you’re not computing on partial datasets)

The goal isn’t to minimize tokens at all costs - it’s to return exactly what the model needs, nothing more, nothing less.

Limitations and Trade-offs

This context-efficient approach works well for YNAB, but it’s not without limitations. Before applying these patterns to your own MCP, consider these trade-offs:

Pre-aggregation assumes query patterns. If users ask questions that need raw transaction details (like “show me the memo for my largest grocery purchase”), the aggregated data won’t help. You’ll need additional tools that return raw data for those cases.

Filtering loses flexibility. By removing fields, you can’t answer questions that need those fields without making additional API calls. The key is knowing your use cases. For budget analysis and categorization, these trade-offs are worth it. For transaction-level forensics, you might need different tools.

Caching complexity. Pre-computed aggregations need invalidation strategies when data changes. If your underlying data updates frequently, you’ll need to think carefully about cache freshness and when to recompute.

Development overhead. Writing aggregation logic and filtering code is more work than simple pass-through wrappers. You’re trading implementation time for runtime efficiency. For frequently-used tools, this is usually worth it.

The goal isn’t to optimize every tool to the extreme. It’s to identify the high-impact workflows—the ones users will perform repeatedly—and optimize those intelligently.

Key Takeaways

Context is expensive: In real Claude Code sessions, MCP tool definitions can consume 24% of the 200k context window before any tools are called
Measure everything: Use tiktoken to count tokens on API responses before and after optimization
Filter proactively: Removing 12 unnecessary fields from 47 accounts reduced tokens by 65.4%
Aggregate strategically: Pre-computing spending summaries reduced a 6-month query by 94.6%
Design for the model: Ask “What does the model need?” not “What does the API provide?”
Default to minimal data: Return only what’s necessary by default, with optional parameters for edge cases
Validate with real workflows: Test complete user flows, not just individual tools, to understand cumulative token impact

Conclusion

Building a context-efficient MCP requires a mindset shift: design for what the model needs, not just what the API provides. The three principles I learned—filter at the source, compute in tools, and work within constraints creatively—apply to any MCP wrapping an external API.

Through careful design, the YNAB MCP achieves dramatic efficiency:

Budget overview: 38% reduction (30k → 19k tokens)
Spending analysis: 94.6% reduction (4.9k → 262 tokens)
Category filtering: 30.7% reduction (12.4k → 8.6k tokens)

These aren’t theoretical—they’re real measurements from actual usage. Context is expensive, and every token matters when you’re building tools for multi-turn conversations.

If you’re building an MCP, start by asking: “What does the model actually need to answer the user’s question?” Not “What does the API return?” That mindset shift makes all the difference.

Understanding Model Context Protocols (MCPs)

Understanding Model Context Protocols (MCPs)

Understanding Context Windows

What is a Token?

Why Token Efficiency Matters

Context limits are hard boundaries: Claude Sonnet 4.5’s 200k token limit sounds generous until you realize a naive MCP returning a year of transactions can consume 746,800 tokens—nearly 4x the entire context window. Your tool call would fail before the model could even process it.

The Naive Approach: Direct API Wrapping

Three Design Principles for Context-Efficient MCPs

The Full Budget Overview

Categorizing Transactions

Design Principle #1: Filter at the Source

Historical Spending Analysis

Design Principle #2: Pre-Aggregate Data

Building Tools for Unsupported Actions

Design Principle #3: Work Within API Constraints

Real-World Token Reduction Results

When to Apply Each Technique

How to Measure Your Own MCP

1. Set Up Token Counting

2. Measure API Responses

3. Test Real Workflows

4. Watch for Runtime Warnings

5. Validate Correctness

Limitations and Trade-offs

Key Takeaways

Conclusion

Further Reading

Similar Posts