Understanding the New Paradigm in AI Infrastructure
As enterprises race to integrate AI agents into their operations, a critical question keeps surfacing: Can we simply reuse our existing API gateway infrastructure for the Model Context Protocol (MCP)? The short answer might be “maybe,” but the real question is whether you should. Let me explain why this seemingly simple infrastructure decision could make or break your AI integration strategy.
The Challenge We’re Facing
Organizations worldwide are rapidly adopting MCP to connect their services and data to AI models through intelligent agents. However, they’re encountering familiar obstacles that echo the early days of API adoption: how do you secure access, provide proper routing, implement rate limiting, ensure observabili…
Understanding the New Paradigm in AI Infrastructure
As enterprises race to integrate AI agents into their operations, a critical question keeps surfacing: Can we simply reuse our existing API gateway infrastructure for the Model Context Protocol (MCP)? The short answer might be “maybe,” but the real question is whether you should. Let me explain why this seemingly simple infrastructure decision could make or break your AI integration strategy.
The Challenge We’re Facing
Organizations worldwide are rapidly adopting MCP to connect their services and data to AI models through intelligent agents. However, they’re encountering familiar obstacles that echo the early days of API adoption: how do you secure access, provide proper routing, implement rate limiting, ensure observability, and create effective developer portals?
History taught us harsh lessons about what happens when services are exposed without proper gateway controls. We saw security breaches, performance disasters, and operational chaos. Now, as we stand at the threshold of the AI revolution, we risk repeating those same mistakes if we don’t understand the fundamental differences between traditional APIs and MCP.
The Core Difference: Stateless vs. Stateful
Before diving into infrastructure solutions, we need to grasp a critical distinction. Traditional REST APIs are stateless services that process each request individually and in isolation. They rely heavily on HTTP semantics, with all the information needed for routing, authorization, and policy enforcement living in HTTP headers and URL structures.
Your API gateway makes intelligent decisions by examining: - HTTP methods (GET, POST, PUT, DELETE) - URL paths (/users/123/orders) - Headers (Authorization: Bearer xyz) - Query parameters (?limit=10&offset=50)
The gateway rarely touches the request body. When it does, it’s typically for minor transformations or extracting specific pieces into headers or metadata. Most importantly, each request stands completely alone, with no session state maintained between calls.
MCP Flips Everything Upside Down
Remote MCP servers operate on an entirely different paradigm. The protocol follows a multi-step process: first, an MCP client connects to a server with an “initialize” message, negotiating various protocol settings. Second, the server assigns a session ID (like Mcp-Session-Id) that coordinates all subsequent interactions for that client. This session maintains critical contextual information: - Protocol capabilities negotiated between client and server - Tool results and context from previous calls and responses - Asynchronous tool call states and streaming updates - Information requests flowing from server to client.
Unlike REST APIs where each request carries complete context in headers, MCP requests contain minimal routing information at the HTTP layer. The entire protocol lives in the HTTP request body. A typical MCP request structure shows this clearly:
POST /mcp
Mcp-Session-Id: session_abc123
Content-Type: application/json
{
"jsonrpc": "2.0",
"method": "tools/call",
"params": {
"name": "database_query",
"arguments": { /* complex nested structure */ }
},
"id": "call_456"
}```
Everything meaningful exists in the JSON-RPC body: the method type, the specific tool being called, and the parameters. HTTP serves merely as a “dumb” transport layer.
Here’s a Python example showing a basic MCP client-server
interaction:
import json import uuid from typing import Dict, Any from datetime import datetime
class MCPSession: “”“Manages a stateful MCP session with context tracking”“”
def __init__(self, session_id: str = None):
self.session_id = session_id or str(uuid.uuid4())
self.protocol_version = "2024-11-05"
self.capabilities = {}
self.tool_context = {}
self.created_at = datetime.now()
def initialize(self, client_capabilities: Dict[str, Any]) -> Dict[str, Any]:
"""Initialize MCP session and negotiate capabilities"""
self.capabilities = client_capabilities
return { “jsonrpc”: “2.0”, “id”: “init_1”, “result”: { “protocolVersion”: self.protocol_version, “capabilities”: { “tools”: {“supported”: True}, “resources”: {“supported”: True}, “prompts”: {“supported”: True} }, “serverInfo”: { “name”: “enterprise-mcp-server”, “version”: “1.0.0” } } }
def call_tool(self, tool_name: str, arguments: Dict[str, Any]) -> Dict[str, Any]:
"""Execute a tool call within the session context"""
call_id = str(uuid.uuid4())
# Store context for this tool call
self.tool_context[call_id] = {
"tool": tool_name,
"timestamp": datetime.now(),
"arguments": arguments
}
return {
"jsonrpc": "2.0",
"method": "tools/call",
"params": {
"name": tool_name,
"arguments": arguments
},
"id": call_id
}
Example usage
session = MCPSession() init_response = session.initialize({“tools”: True}) print(f“Session ID: {session.session_id}“) print(json.dumps(init_response, indent=2))
Make a tool call
tool_request = session.call_tool(“database_query”, { “query”: “SELECT * FROM users WHERE status=‘active’”, “limit”: 100 }) print(json.dumps(tool_request, indent=2))
The complexity deepens further. MCP servers can initiate communication back to clients through Server-Sent Events (SSE), pushing progress updates, streaming results, or even new requests. This bidirectional, session-aware communication pattern fundamentally differs from the request-response model that API gateways were designed to handle.
**Can You Retrofit Your API Gateway?
**Given these fundamental differences, let’s examine whether existing API gateways can handle MCP workloads. There are both similarities and critical gaps. Both protocols operate over HTTP, can leverage JWT/token/OAuth-style security, and technically allow gateways to process request bodies.
Let’s explore the common approaches, from simplest to most complex:
**
1. Simple Passthrough Proxy**
At the most basic level, your API gateway can treat MCP requests like any other HTTP POST with a JSON payload. Without understanding JSON-RPC structure or MCP semantics, it can still provide:
- HTTP-level authentication (API keys, OAuth tokens)
- Basic rate limiting per client or IP address
- TLS termination and certificate management
- Request and response logging with metrics
For example, you could validate that requests include a JWT in the HTTP Authorization header and verify it against a trusted identity provider. This represents basic HTTP handling that any API gateway handles well.
However, limitations emerge quickly when dealing with SSE streams. While most modern API gateways can return event streams, they lack the ability to: - Inspect or filter individual SSE events - Track progress, detect errors, or measure per-event latency - Revoke access or apply policies as streams progress - Maintain session context across multiple SSE events.
Think of this like putting a generic reverse proxy in front of a database. You get connection pooling and basic monitoring, but no query-level insights or policies. The moment you need to understand what flows through the proxy, you’ve outgrown this approach.
Here’s a Python example of a simple passthrough proxy for MCP:
`from flask import Flask, request, Response, stream_with_context
import requests
import jwt
from datetime import datetime
from functools import wraps
app = Flask(__name__)
JWT_SECRET = "your-secret-key"
MCP_BACKEND_URL = "http://mcp-server:8080"
# Simple rate limiting
request_counts = {}
RATE_LIMIT = 100 # requests per minute
def validate_jwt(f):
"""Decorator for JWT validation"""
@wraps(f)
def decorated_function(*args, **kwargs):
auth_header = request.headers.get('Authorization')
if not auth_header or not auth_header.startswith('Bearer '):
return {"error": "Missing or invalid token"}, 401
try:
token = auth_header.split(' ')[1]
payload = jwt.decode(token, JWT_SECRET, algorithms=['HS256'])
request.user_id = payload.get('user_id')
except jwt.InvalidTokenError:
return {"error": "Invalid token"}, 401
return f(*args, **kwargs)
return decorated_function
def check_rate_limit(user_id: str) -> bool:
"""Simple rate limiting check"""
now = datetime.now()
minute_key = f"{user_id}:{now.strftime('%Y-%m-%d-%H-%M')}"
if minute_key not in request_counts:
request_counts[minute_key] = 0
request_counts[minute_key] += 1
return request_counts[minute_key] <= RATE_LIMIT
@app.route('/mcp', methods=['POST'])
@validate_jwt
def passthrough_proxy():
"""
Simple passthrough proxy with:
- JWT authentication
- Basic rate limiting
- Request/response logging
"""
user_id = request.user_id
# Rate limiting
if not check_rate_limit(user_id):
return {"error": "Rate limit exceeded"}, 429
# Log the request
print(f"[{datetime.now()}] User {user_id} - MCP Request")
# Forward to backend
headers = {
'Content-Type': 'application/json',
'Mcp-Session-Id': request.headers.get('Mcp-Session-Id', '')
}
try:
# Check if this is an SSE stream request
if request.headers.get('Accept') == 'text/event-stream':
# Stream the response
backend_response = requests.post(
f"{MCP_BACKEND_URL}/mcp",
json=request.json,
headers=headers,
stream=True
)
def generate():
for chunk in backend_response.iter_content(chunk_size=1024):
if chunk:
yield chunk
return Response(
stream_with_context(generate()),
content_type='text/event-stream'
)
else:
# Regular request-response
backend_response = requests.post(
f"{MCP_BACKEND_URL}/mcp",
json=request.json,
headers=headers
)
# Log the response
print(f"[{datetime.now()}] User {user_id} - Status {backend_response.status_code}")
return backend_response.json(), backend_response.status_code
except Exception as e:
print(f"[{datetime.now()}] Error: {str(e)}")
return {"error": "Backend unavailable"}, 503
if __name__ == '__main__':
app.run(host='0.0.0.0', port=8000)
`
**Limitations of this approach: **
- Cannot inspect or filter SSE events - No tool-level authorization - Cannot maintain session context - No protocol-aware routing.
**2. Partial Protocol Support**
With sufficient custom development, you can teach your API gateway to parse MCP JSON-RPC payloads and extract meaningful information for policy decisions. Most API gateways support custom body parsing through JavaScript, Lua, or similar scripting mechanisms.
This enables: - Understanding JSON-RPC request structures - Tool-level authorization (e.g., “marketing users can’t call database_query”) - Basic request transformation and validation
However, the reality quickly becomes painful:
Dynamic parsing complexity: MCP tool lists have arbitrary lengths. Your JSONPath expressions become increasingly complex and fragile as you try to handle various scenarios.
Performance overhead: JavaScript or Lua policies run slower than native gateway operations, adding latency to every request.
Maintenance burden: Every new MCP tool may require updating gateway policies. Your infrastructure team becomes tightly coupled to MCP server development, creating organizational bottlenecks.
Limited streaming support: While some gateways handle SSEs, applying policy midstream becomes exponentially more complex.
In practice, you end up building a gateway on top of your existing gateway, constantly fighting to implement features or squeeze out performance improvements.
Here’s a Python example showing protocol-aware parsing with authorization:
`import json
from typing import Dict, Any, List, Optional
from enum import Enum
class UserRole(Enum):
ADMIN = "admin"
DEVELOPER = "developer"
MARKETING = "marketing"
ANALYST = "analyst"
class MCPProtocolGateway:
"""Protocol-aware MCP gateway with tool-level authorization"""
def __init__(self):
# Define tool permissions by role
self.tool_permissions = {
UserRole.ADMIN: ["*"], # All tools
UserRole.DEVELOPER: ["database_query", "api_call", "file_read", "git_operations"],
UserRole.MARKETING: ["analytics_query", "report_generate", "email_send"],
UserRole.ANALYST: ["database_query", "analytics_query", "report_generate"]
}
def parse_jsonrpc(self, request_body: Dict[str, Any]) -> Optional[Dict[str, Any]]:
"""Parse and validate JSON-RPC structure"""
try:
# Validate JSON-RPC format
if request_body.get("jsonrpc") != "2.0":
return {"error": "Invalid JSON-RPC version"}
method = request_body.get("method")
params = request_body.get("params", {})
request_id = request_body.get("id")
return {
"method": method,
"params": params,
"id": request_id
}
except Exception as e:
return {"error": f"Invalid JSON-RPC structure: {str(e)}"}
def authorize_tool_call(self, user_role: UserRole, tool_name: str) -> bool:
"""Check if user has permission to call the specified tool"""
allowed_tools = self.tool_permissions.get(user_role, [])
# Check wildcard permission
if "*" in allowed_tools:
return True
# Check specific tool permission
return tool_name in allowed_tools
def filter_tools_list(self, tools: List[Dict], user_role: UserRole) -> List[Dict]:
"""Filter tool list based on user permissions"""
allowed_tools = self.tool_permissions.get(user_role, [])
if "*" in allowed_tools:
return tools
return [
tool for tool in tools
if tool.get("name") in allowed_tools
]
def sanitize_response(self, response: Dict[str, Any], user_role: UserRole) -> Dict[str, Any]:
"""Remove sensitive data from responses based on user clearance"""
if user_role == UserRole.ADMIN:
return response # Admins see everything
# Remove sensitive fields
sensitive_fields = ["password", "api_key", "secret", "token", "ssn", "credit_card"]
def remove_sensitive(obj):
if isinstance(obj, dict):
return {
k: remove_sensitive(v) for k, v in obj.items()
if k.lower() not in sensitive_fields
}
elif isinstance(obj, list):
return [remove_sensitive(item) for item in obj]
return obj
return remove_sensitive(response)
def process_request(self, request_body: Dict[str, Any], user_role: UserRole) -> Dict[str, Any]:
"""Process MCP request with protocol understanding and authorization"""
# Parse JSON-RPC
parsed = self.parse_jsonrpc(request_body)
if "error" in parsed:
return {"jsonrpc": "2.0", "error": parsed["error"], "id": None}
method = parsed["method"]
params = parsed["params"]
# Handle tools/list request - filter based on permissions
if method == "tools/list":
# Assume we have a full tool list
all_tools = [
{"name": "database_query", "description": "Query database"},
{"name": "api_call", "description": "Make API calls"},
{"name": "analytics_query", "description": "Query analytics"},
{"name": "email_send", "description": "Send emails"},
{"name": "git_operations", "description": "Git operations"}
]
filtered_tools = self.filter_tools_list(all_tools, user_role)
return {
"jsonrpc": "2.0",
"id": parsed["id"],
"result": {"tools": filtered_tools}
}
# Handle tools/call request - check authorization
elif method == "tools/call":
tool_name = params.get("name")
if not self.authorize_tool_call(user_role, tool_name):
return {
"jsonrpc": "2.0",
"id": parsed["id"],
"error": {
"code": -32001,
"message": f"Unauthorized: User role '{user_role.value}' cannot access tool '{tool_name}'"
}
}
# Forward to backend (simplified)
print(f"Authorized tool call: {tool_name} by {user_role.value}")
return {
"jsonrpc": "2.0",
"id": parsed["id"],
"result": {"status": "authorized", "tool": tool_name}
}
return {"jsonrpc": "2.0", "id": parsed["id"], "result": {}}
# Example usage
gateway = MCPProtocolGateway()
# Marketing user tries to call database_query (not allowed)
request1 = {
"jsonrpc": "2.0",
"method": "tools/call",
"params": {"name": "database_query", "arguments": {}},
"id": "call_1"
}
result1 = gateway.process_request(request1, UserRole.MARKETING)
print("Marketing user calling database_query:")
print(json.dumps(result1, indent=2))
# Developer user calls database_query (allowed)
result2 = gateway.process_request(request1, UserRole.DEVELOPER)
print("\nDeveloper user calling database_query:")
print(json.dumps(result2, indent=2))
# Marketing user lists available tools (filtered)
request3 = {
"jsonrpc": "2.0",
"method": "tools/list",
"params": {},
"id": "list_1"
}
result3 = gateway.process_request(request3, UserRole.MARKETING)
print("\nMarketing user tool list:")
print(json.dumps(result3, indent=2))`
This approach provides better control but becomes increasingly complex and brittle as your MCP implementation evolves.
**3. MCP Brokering**
MCP brokering involves the gateway actively participating in protocol conversations, not just proxying requests but potentially modifying, filtering, or enhancing them based on policy decisions. This becomes critical in enterprise environments where updating all MCP clients simultaneously when servers upgrade to new protocol versions may be impossible.
Brokering enables: - Version shielding: Protecting MCP clients from breaking changes during server upgrades - Request filtering: Removing tools from discovery responses based on backward compatibility needs - Response sanitization: Stripping sensitive data from tool responses based on user clearance levels - Context injection: Adding enterprise context (user ID, tenant information) to tool calls - Error handling: Converting MCP protocol errors into enterprise-compliant audit events
Traditional API gateways struggle here because they lack native JSON-RPC understanding and session-aware policy engines. The gateway needs to truly comprehend the protocol to make intelligent mediation decisions.
**4. MCP Multiplexing**
This is where traditional API gateways hit an insurmountable wall. MCP multiplexing aggregates multiple backend MCP servers into a single logical endpoint—what we call a “virtual MCP.”
Imagine a client connecting to one MCP endpoint but actually accessing tools from multiple backend servers: - Weather tools from weather-service.internal - Database tools from analytics-service.internal - Email tools from notification-service.internal
Instead of AI agents needing to know about and connect to dozens of different MCP servers, they connect to one virtualized endpoint providing a unified interface to all enterprise tools.
Implementing this requires capabilities that traditional API gateways simply don’t possess:
Session fan-out: When clients request “tools/list,” the gateway must query all backend servers and merge results seamlessly.
Request routing: Tool calls must route to the correct backend based on tool names, requiring deep protocol understanding.
Response multiplexing: Streaming responses from multiple backends must merge into a single coherent SSE stream.
State coordination: Session IDs and protocol negotiations must be managed across multiple backend connections.
Error handling: Failures in one backend shouldn’t break the entire virtual session.
This level of protocol-aware aggregation and virtualization exceeds what traditional API gateways were designed to handle. You’d essentially need to rewrite the gateway’s core request-response handling logic to support MCP session semantics.
**Key capabilities demonstrated:**
- Session fan-out across multiple backends - Tool aggregation and name collision handling - Intelligent routing based on tool names - Async coordination of multiple backend servers - Error handling for backend failures
This multiplexing pattern requires deep protocol understanding that traditional API gateways cannot provide.
**Enter Agentgateway: **
Purpose-Built for MCP
Recognizing these challenges, the open source community created Agentgateway, a Linux Foundation project purpose-built in Rust for AI agent protocols like MCP. Unlike traditional API gateways optimized for stateless REST interactions, Agentgateway natively understands:
JSON-RPC message structures
Stateful session mappings
Bidirectional communication patterns inherent to MCP
This deep protocol awareness allows it to properly multiplex and demultiplex MCP sessions, fan out client requests across multiple backend MCP servers, aggregate tool lists, and maintain the critical two-way session mapping needed when servers initiate messages back to clients.
Rather than fighting against an architecture designed for request-response APIs, Agentgateway’s foundation aligns perfectly with MCP’s session-oriented, streaming communication model. It serves as a native MCP gateway, large language model (LLM) gateway, and agent-to-agent (A2A) proxy, providing the security, observability, and governance capabilities that traditional API gateways cannot deliver.
Key capabilities include: - MCP multiplexing to federate tools from multiple backend servers - Fine-grained authorization policies controlling which tools clients can access - Seamless handling of both stdio and HTTP Streamable transports - Integration with the CNCF project kgateway as a control plane, enabling Kubernetes-native management using standard Gateway API resources
**The Bottom Line**
While you might be able to force your existing API gateway to handle basic MCP traffic, doing so means accepting significant compromises in functionality, performance, and maintainability. The fundamental architectural differences between stateless REST APIs and stateful MCP sessions require purpose-built infrastructure.
As organizations learned during the API revolution, proper gateway infrastructure isn’t optional—it’s essential for security, scalability, and operational excellence. The same holds true for MCP and AI agent workflows. Invest in the right tools from the start, and you’ll avoid the painful lessons that come from trying to retrofit yesterday’s infrastructure for tomorrow’s challenges.
The question isn’t whether you can use your API gateway for MCP. The question is whether you should build your AI future on infrastructure designed for a fundamentally different paradigm. The answer, increasingly, is a clear no.
I will be at at KubeCon + CloudNativeCon North America 2025, exploring the evolving landscape of AI agent infrastructure and the critical role of purpose-built gateways in enterprise AI adoption. Happy to catchup and discuss on similar topics like AI Infra + Agentic AI + Enterprise AI + Role of Kubernetes in AI Infra.