A comprehensive walkthrough for tracing using ClaudeAgentSDK and Skills.
15 min readDec 7, 2025
–
Press enter or click to view image in full size
This guide provides a comprehensive walkthrough for implementing CloudWatch logging in Bedrock AgentCore environments using ClaudeAgentSDK and Skills.
Why Agent Tracing Matters
Effective tracing is critical for AI agents — it enables debugging complex multi-turn conversations, tracking token usage and costs, identifying performance bottlenecks, and ensuring runtime reliability. Without proper tracing, diagnosing issues in production agents becomes nearly impossible.
CloudWatch as a Reliable Fallback
*While dedicated observability platforms (like Langfuse or LangSmith) offer richer visualization, CloudW…
A comprehensive walkthrough for tracing using ClaudeAgentSDK and Skills.
15 min readDec 7, 2025
–
Press enter or click to view image in full size
This guide provides a comprehensive walkthrough for implementing CloudWatch logging in Bedrock AgentCore environments using ClaudeAgentSDK and Skills.
Why Agent Tracing Matters
Effective tracing is critical for AI agents — it enables debugging complex multi-turn conversations, tracking token usage and costs, identifying performance bottlenecks, and ensuring runtime reliability. Without proper tracing, diagnosing issues in production agents becomes nearly impossible.
CloudWatch as a Reliable Fallback
While dedicated observability platforms (like Langfuse or LangSmith) offer richer visualization, CloudWatch provides a convenient, always-available tracing solution that requires no additional infrastructure. Although the log viewing experience is not as polished, CloudWatch is automatically integrated with AgentCore and serves as a dependable last-resort option for monitoring agent and skill behavior.
This guide covers the fundamentals of Skills, their advantages over alternatives like MCP, and provides a complete step-by-step example project.
What we will cover:
- Introduction to Skills
- Skills vs MCP: Understanding the Differences
- Using Skills in ClaudeAgentSDK
- Bedrock AgentCore Environment Specifics
- Logging Strategy for CloudWatch
- Complete Example Project
- Best Practices and Troubleshooting
1. Introduction to Skills
What Are Skills?
Skills are modular capability packages that enhance Claude’s ability to perform specialized tasks. Each Skill consists of:
- Instructions: A SKILL.md file containing procedural knowledge and usage guidelines.
- Scripts: Optional executable code (Python, Bash) that Claude can invoke.
- Resources: Supporting files like templates, reference documentation, and examples.
When Claude encounters a task that matches a Skill’s description, it automatically loads the Skill’s instructions and uses the associated tools to complete the task.
What Problems Do Skills Solve?
Traditional approaches to extending AI agent capabilities have several limitations:
- Prompt Engineering Complexity: Without Skills, developers must craft complex system prompts that include all possible instructions, leading to token bloat and reduced effectiveness.
- Static Capabilities: Agents without Skills have fixed capabilities that cannot be easily extended or customized for specific domains.
- Knowledge Fragmentation: Without a structured approach, domain knowledge gets scattered across multiple prompts, making maintenance difficult.
- Execution Reliability: Pure LLM-generated code can be unreliable for complex operations. Skills allow developers to provide tested, reliable scripts.
Skills address these problems by:
- Modular Loading: Skills are loaded on-demand, keeping the context window efficient.
- Structured Knowledge: Instructions, examples, and references are organized in a standard format.
- Reliable Execution: Pre-written scripts handle complex operations reliably.
- Easy Maintenance: Skills can be updated independently without affecting other capabilities.
2. Skills vs MCP: Understanding the Differences
Model Context Protocol (MCP)
MCP (Model Context Protocol) is a standardized protocol for connecting AI models to external services and data sources.
- Focus: Provides access to external tools and APIs.
- Architecture: Client-server model with JSON-RPC communication.
- Deployment: Requires running MCP servers (local or remote).
- Use Case: Integrating with external services (databases, APIs, SaaS tools).
Skills
Skills are filesystem-based capability packages.
- Focus: Provides procedural knowledge and local execution capabilities.
- Architecture: Filesystem-based, discovered from .claude/skills/ directories.
- Deployment: Packaged with the application, no additional servers needed.
- Use Case: Domain-specific tasks, local file processing, custom workflows.
Comparison at a Glance
Press enter or click to view image in full size
When to Use Each
Use Skills when:
- Processing local files
- Implementing custom domain workflows
- Packaging reusable capabilities
- Minimizing external dependencies
Use MCP when:
- Integrating with external APIs (Jira, Confluence, databases)
- Accessing remote data sources
- Sharing tools across multiple agents
Use Both when:
- Building complex agents that need both local processing and external integration.
- Example: A document agent that processes PDFs locally (Skills) and stores results in Confluence (MCP).
3. Using Skills in ClaudeAgentSDK
Skill Directory Structure
Skills are organized in the .claude/skills/ directory. Your project structure should look like this:
project_root/├── .claude/│ └── skills/│ └── my-custom-skill/│ ├── SKILL.md # Required: Main skill definition│ ├── REFERENCE.md # Optional: Detailed API documentation│ ├── EXAMPLES.md # Optional: Usage examples│ └── scripts/ # Optional: Executable scripts│ ├── tool_a.py│ └── tool_b.py├── src/│ └── ...└── requirements.txt
SKILL.md Format
The SKILL.md file defines the Skill using YAML frontmatter and Markdown content:
---name: my-custom-skilldescription: Brief description of what this skill does and when to use it.allowed-tools: Bash, Read, Write---# My Custom Skill## Quick StartProvide a simple example showing how to use the skill.## InstructionsStep-by-step instructions for Claude on how to perform the task.## Available ScriptsList and describe the scripts available in this skill.
Configuring ClaudeAgentSDK
To enable Skills in your agent, you need to configure the ClaudeAgentOptions.
from claude_agent_sdk import ClaudeAgentOptions, queryoptions = ClaudeAgentOptions( system_prompt="Your agent's system prompt...", # Enable Skills discovery from filesystem setting_sources=["project", "user"], # Include "Skill" in allowed tools allowed_tools=[ "Skill", # Required for Skills to work "Bash", # Used by Skills to execute scripts "Read", # File reading "Write", # File writing "WebFetch" # HTTP requests ], permission_mode="bypassPermissions", # For automated agents max_turns=50)# Use the agentasync for message in query(prompt="Analyze this document...", options=options): # Process messages pass
The setting_sources parameter controls where Skills are loaded from:
- “project”: Scans .claude/skills/ in the project root.
- “user”: Scans ~/.claude/skills/ in the user’s home directory.
4. Bedrock AgentCore Environment Specifics
AgentCore Architecture
Amazon Bedrock AgentCore provides a fully managed runtime environment for AI agents. Key characteristics include:
- MicroVM-based: Each agent runs in an isolated microVM for security.
- Container Deployment: Agents are packaged as Docker containers.
- Managed Scaling: Automatic scaling based on demand.
- VPC Integration: Secure deployment within your VPC.
How Skills Execute in AgentCore
In AgentCore, when Claude invokes a Skill, the process flows as follows:
- Claude identifies a matching Skill based on the task.
- Claude loads the Skill’s instructions into context.
- Claude uses the Bash tool to execute scripts.
- Scripts run as subprocess calls within the container.
- Script output is captured and returned to Claude.
This execution model creates a unique logging challenge.
Press enter or click to view image in full size
The Logging Challenge
Problem* Skills execute as separate processes via Bash. Their stdout is captured by Claude as the tool result (JSON). Therefore, logging debug information to stdout would corrupt the JSON result and confuse the agent.*
Solution* Skills must log to stderr. In the AgentCore environment, stderr flows through to CloudWatch independently, allowing for debugging without interfering with the agent’s operation.*
5. Logging Strategy for CloudWatch
Understanding the Log Flow
In AgentCore, CloudWatch captures all output from the container’s stdout and stderr streams. The logging strategy must account for two distinct contexts:
- Agent Process Logs: Python logging from the main agent code.
- Skill Script Logs: Output from subprocess-executed scripts.
Agent Process Logging
The agent process uses Python’s standard logging module, configured to output to stdout.
Get Xue Langping’s stories in your inbox
Join Medium for free to get updates from this writer.
logging_config.py
import osimport sysimport loggingLOG_LEVEL = os.environ.get("LOG_LEVEL", "INFO").upper()LOG_FORMAT = "%(asctime)s - %(name)s - %(levelname)s - %(message)s"DATE_FORMAT = "%Y-%m-%d %H:%M:%S"def setup_logging() -> None: """Configure logging for the entire application.""" numeric_level = getattr(logging, LOG_LEVEL, logging.INFO) # Output to stdout - CloudWatch captures this handler = logging.StreamHandler(sys.stdout) handler.setLevel(numeric_level) handler.setFormatter(logging.Formatter(LOG_FORMAT, DATE_FORMAT)) root_logger = logging.getLogger() root_logger.setLevel(numeric_level) root_logger.handlers.clear() root_logger.addHandler(handler)def get_logger(name: str) -> logging.Logger: """Get a logger with the specified name.""" return logging.getLogger(name)
Skill Script Logging (Critical Pattern)
For Skill scripts, the logging pattern is different and critical to understand:
- stdout: Reserved for JSON results ONLY (captured by Claude).
- stderr: Used for debug/trace logs (sent to CloudWatch).
Example Skill Script:
#!/usr/bin/env python3"""Skill script logging pattern.CRITICAL: - stdout: JSON results ONLY (captured by Claude)- stderr: Debug/trace logs (sent to CloudWatch)"""import sysimport jsonimport logging# Configure logging to stderr (NOT stdout!)logging.basicConfig( level=logging.DEBUG, format='%(asctime)s - [SKILL:my_tool] - %(levelname)s - %(message)s', stream=sys.stderr # This is the key!)logger = logging.getLogger(__name__)def my_function(arg): """Example function with proper logging.""" logger.info(f"[START] Processing: {arg}") try: # Do work here result = {"status": "success", "data": "..."} logger.info("[END] Processing complete") return result except Exception as e: logger.error(f"[ERROR] {str(e)}", exc_info=True) return {"error": str(e)}if __name__ == "__main__": # Parse arguments arg = sys.argv[1] if len(sys.argv) > 1 else "" # Execute and output JSON to stdout result = my_function(arg) print(json.dumps(result)) # Only JSON goes to stdout
Log Format Conventions
Use consistent prefixes to make logs easily searchable in CloudWatch:
- [START]: Beginning of an operation
- [END]: Completion of an operation
- [ERROR]: Error conditions
- [SKILL:name]: Identifies which skill generated the log
- [TOOL_CALL]: Agent tool invocation
- [TOOL_RESULT]: Tool execution result
CloudWatch Log Groups
In AgentCore, logs generally appear in this locations:
- Standard Logs: /aws/bedrock-agentcore/runtimes/<agent_id>/standard-logs
6. Complete Example Project
This section provides a complete, working example that you can use as a template.
Project Structure
cloudwatch-logging-agent/├── .claude/│ └── skills/│ └── greeting-skill/│ ├── SKILL.md│ └── scripts/│ └── greeter.py├── src/│ ├── __init__.py│ ├── logging_config.py│ ├── agent.py│ └── server.py├── Dockerfile├── requirements.txt└── README.md
Step 1: Create the Logging Configuration
src/logging_config.py
"""Centralized Logging ConfigurationKey features:- Outputs to stdout (captured by CloudWatch in AgentCore)- Consistent format across all modules- Configurable log level via environment variable"""import osimport sysimport loggingLOG_LEVEL = os.environ.get("LOG_LEVEL", "INFO").upper()LOG_FORMAT = "%(asctime)s - %(name)s - %(levelname)s - %(message)s"DATE_FORMAT = "%Y-%m-%d %H:%M:%S"def setup_logging() -> None: """ Configure logging for the entire application. Call this once at application startup. """ numeric_level = getattr(logging, LOG_LEVEL, logging.INFO) handler = logging.StreamHandler(sys.stdout) handler.setLevel(numeric_level) handler.setFormatter(logging.Formatter(LOG_FORMAT, DATE_FORMAT)) root_logger = logging.getLogger() root_logger.setLevel(numeric_level) root_logger.handlers.clear() root_logger.addHandler(handler) # Configure uvicorn loggers for logger_name in ["uvicorn", "uvicorn.access", "uvicorn.error"]: uvicorn_logger = logging.getLogger(logger_name) uvicorn_logger.handlers.clear() uvicorn_logger.addHandler(handler) uvicorn_logger.setLevel(numeric_level) logger = logging.getLogger(__name__) logger.info(f"Logging configured: level={LOG_LEVEL}, output=stdout")def get_logger(name: str) -> logging.Logger: """Get a logger with the specified name.""" return logging.getLogger(name)
Step 2: Create the Skill
.claude/skills/greeting-skill/SKILL.md
---name: greeting-skilldescription: Generate personalized greetings. Use when users ask for greetings or welcome messages.allowed-tools: Bash---# Greeting SkillGenerate personalized greeting messages using Python.## Quick Start```bashpython3 .claude/skills/greeting-skill/scripts/greeter.py greet "Alice"```## Available Commands### greetGenerate a personalized greeting.```bashpython3 .claude/skills/greeting-skill/scripts/greeter.py greet "<name>"```### farewellGenerate a farewell message.```bashpython3 .claude/skills/greeting-skill/scripts/greeter.py farewell "<name>"```## Output FormatAll commands return JSON:```json{"message": "Hello, Alice!", "success": true}```
.claude/skills/greeting-skill/scripts/greeter.py
#!/usr/bin/env python3"""Greeting Skill ScriptIMPORTANT: This script demonstrates proper logging for Skills:- stdout: JSON results returned to Claude (do NOT print debug info here)- stderr: Debug/trace logs (safe for debugging, captured by CloudWatch)"""import osimport sysimport jsonimport timeimport loggingfrom datetime import datetime# Configure logging to stderr# This is CRITICAL - stdout is reserved for JSON resultslogging.basicConfig( level=logging.DEBUG, format='%(asctime)s - [SKILL:greeter] - %(levelname)s - %(message)s', stream=sys.stderr)logger = logging.getLogger(__name__)def log_operation_start(operation: str, **kwargs): """Log the start of an operation.""" logger.info(f"[START] Operation: {operation}") for key, value in kwargs.items(): logger.info(f" {key}: {value}")def log_operation_end(operation: str, success: bool, elapsed: float): """Log the end of an operation.""" status = "SUCCESS" if success else "FAILED" logger.info(f"[END] Operation: {operation} - {status} - {elapsed:.3f}s")def greet(name: str) -> dict: """Generate a greeting message.""" start_time = time.time() log_operation_start("greet", name=name) try: if not name or not name.strip(): logger.warning("Empty name provided") return {"error": "Name cannot be empty", "success": False} hour = datetime.now().hour if hour < 12: greeting = "Good morning" elif hour < 18: greeting = "Good afternoon" else: greeting = "Good evening" message = f"{greeting}, {name}! Welcome to the CloudWatch Logging Demo." logger.debug(f"Generated greeting: {message}") log_operation_end("greet", success=True, elapsed=time.time() - start_time) return {"message": message, "success": True} except Exception as e: logger.error(f"Error generating greeting: {str(e)}", exc_info=True) log_operation_end("greet", success=False, elapsed=time.time() - start_time) return {"error": str(e), "success": False}def farewell(name: str) -> dict: """Generate a farewell message.""" start_time = time.time() log_operation_start("farewell", name=name) try: if not name or not name.strip(): logger.warning("Empty name provided") return {"error": "Name cannot be empty", "success": False} message = f"Goodbye, {name}! Thank you for using the CloudWatch Logging Demo." logger.debug(f"Generated farewell: {message}") log_operation_end("farewell", success=True, elapsed=time.time() - start_time) return {"message": message, "success": True} except Exception as e: logger.error(f"Error generating farewell: {str(e)}", exc_info=True) log_operation_end("farewell", success=False, elapsed=time.time() - start_time) return {"error": str(e), "success": False}if __name__ == "__main__": if len(sys.argv) < 2: print(json.dumps({"error": "Usage: greeter.py <command> [args...]"})) sys.exit(1) command = sys.argv[1] if command == "greet": if len(sys.argv) < 3: print(json.dumps({"error": "Usage: greeter.py greet <name>"})) sys.exit(1) name = sys.argv[2] result = greet(name) print(json.dumps(result)) elif command == "farewell": if len(sys.argv) < 3: print(json.dumps({"error": "Usage: greeter.py farewell <name>"})) sys.exit(1) name = sys.argv[2] result = farewell(name) print(json.dumps(result)) else: print(json.dumps({"error": f"Unknown command: {command}"})) sys.exit(1)
Step 3: Create the Agent Logic
src/agent.py
"""Core Agent Logic with Comprehensive LoggingThis module demonstrates proper logging for:- Tool calls and results- Thinking blocks- Text responses- Cost tracking"""import jsonfrom typing import AsyncGeneratorfrom claude_agent_sdk import ( AssistantMessage, ResultMessage, UserMessage, TextBlock, ThinkingBlock, ToolUseBlock, ToolResultBlock, ClaudeAgentOptions, query,)from src.logging_config import get_loggerlogger = get_logger(__name__)LOG_SEPARATOR = "-" * 60def create_agent_options() -> ClaudeAgentOptions: """Create configured ClaudeAgentOptions.""" config = { "system_prompt": ( "You are a helpful assistant with greeting capabilities. " "When users ask for greetings, use the greeting-skill. " "Always complete the task using the available Skills." ), "permission_mode": "bypassPermissions", "max_turns": 20, "setting_sources": ["project", "user"], "allowed_tools": [ "Skill", "Bash", "Read", "Write", ] } logger.info("=" * 60) logger.info("Agent Configuration:") logger.info(f" Setting Sources: {config['setting_sources']}") logger.info(f" Allowed Tools: {config['allowed_tools']}") logger.info("=" * 60) return ClaudeAgentOptions(**config)def _log_tool_use(block: ToolUseBlock, turn: int) -> None: """Log tool use information.""" logger.info(LOG_SEPARATOR) logger.info(f"[TOOL_CALL] Turn {turn}") logger.info(f" Tool: {getattr(block, 'name', 'unknown')}") logger.info(f" ID: {getattr(block, 'id', 'unknown')}") try: input_str = json.dumps(getattr(block, 'input', {}), ensure_ascii=False) logger.info(f" Input: {input_str}") except Exception as e: logger.info(f" Input: <serialization error: {e}>")def _log_tool_result(block: ToolResultBlock, turn: int) -> None: """Log tool result information for tracing.""" tool_use_id = getattr(block, 'tool_use_id', 'unknown') content = getattr(block, 'content', '') is_error = getattr(block, 'is_error', False) logger.info(LOG_SEPARATOR) logger.info(f"[TOOL_RESULT] Turn {turn}") logger.info(f" Tool Use ID: {tool_use_id}") logger.info(f" Is Error: {is_error}") # Log full result content (includes skill JSON output) content_str = str(content) logger.info(f" Result: {content_str}")async def process_prompt(prompt: str) -> str: """Process a user prompt and return the response.""" if not prompt or not prompt.strip(): logger.warning("Received empty prompt") return "No text content provided." logger.info("=" * 60) logger.info("[REQUEST_START] Processing prompt") logger.info(f" Length: {len(prompt)} chars") logger.info(f" Content: {prompt[:200]}...") logger.info("=" * 60) options = create_agent_options() full_response = [] turn_count = 0 tool_call_count = 0 try: async for message in query(prompt=prompt, options=options): if isinstance(message, AssistantMessage): turn_count += 1 logger.info(f"[TURN {turn_count}] AssistantMessage") for block in message.content: if isinstance(block, ToolUseBlock): tool_call_count += 1 _log_tool_use(block, turn_count) elif isinstance(block, ToolResultBlock): _log_tool_result(block, turn_count) elif isinstance(block, ThinkingBlock): logger.debug(f"[THINKING] Turn {turn_count}") elif isinstance(block, TextBlock): logger.info(f"[RESPONSE] Turn {turn_count}: {len(block.text)} chars") full_response.append(block.text) elif isinstance(message, UserMessage): # UserMessage contains tool results after skill/tool execution logger.info(f"[TURN {turn_count}] UserMessage (tool results)") content = message.content # Iterate through content to log each tool result if isinstance(content, list): for block in content: if isinstance(block, ToolResultBlock): _log_tool_result(block, turn_count) elif isinstance(block, TextBlock): logger.info(f"[USER_TEXT] Turn {turn_count}: {block.text[:200]}") else: logger.info(f"[USER_CONTENT] Turn {turn_count}: {str(content)[:500]}") elif isinstance(message, ResultMessage): logger.info("=" * 60) logger.info("[REQUEST_COMPLETE]") logger.info(f" Turns: {turn_count}") logger.info(f" Tool Calls: {tool_call_count}") if message.total_cost_usd > 0: logger.info(f" Cost: ${message.total_cost_usd:.6f}") logger.info("=" * 60) except Exception as e: logger.error(f"[REQUEST_ERROR] {str(e)}", exc_info=True) return f"Error: {str(e)}" return "".join(full_response) or "No response generated."
Step 4: Create the Server
src/server.py
"""FastAPI Server for AgentCoreImplements the HTTP protocol required by AgentCore runtime."""import uuidfrom contextlib import asynccontextmanagerfrom fastapi import FastAPI, Requestfrom fastapi.responses import JSONResponsefrom src.logging_config import setup_logging, get_loggerfrom src.agent import process_prompt# Initialize logging firstsetup_logging()logger = get_logger(__name__)@asynccontextmanagerasync def lifespan(app: FastAPI): """Application lifecycle manager.""" logger.info("Server starting up...") yield logger.info("Server shutting down...")app = FastAPI( title="CloudWatch Logging Demo Agent", version="1.0.0", lifespan=lifespan)@app.get("/ping")async def ping(): """Health check endpoint for AgentCore.""" return {"status": "Healthy"}@app.post("/invocations")async def invocations(request: Request): """Main processing endpoint for AgentCore.""" request_id = str(uuid.uuid4())[:8] try: body = await request.json() except Exception as e: logger.error(f"[{request_id}] JSON parse error: {e}") return JSONResponse({"error": "Invalid JSON"}, status_code=400) prompt = body.get("prompt", "") if not prompt: return JSONResponse({"error": "Missing prompt"}, status_code=400) logger.info(f"[{request_id}] Processing request") try: response = await process_prompt(prompt) logger.info(f"[{request_id}] Request completed") return JSONResponse({"response": response, "status": "success"}) except Exception as e: logger.error(f"[{request_id}] Error: {e}", exc_info=True) return JSONResponse({"error": str(e)}, status_code=500)if __name__ == "__main__": import uvicorn uvicorn.run(app, host="0.0.0.0", port=8080)
Step 5: Create requirements.txt
requirements.txt
claude-agent-sdk>=0.1.0fastapiuvicorn
Step 6: Create the Dockerfile
Dockerfile
FROM ghcr.io/astral-sh/uv:python3.12-bookworm-slimWORKDIR /app# Environment variablesENV UV_SYSTEM_PYTHON=1 \ UV_COMPILE_BYTECODE=1 \ PYTHONUNBUFFERED=1 \ HOME=/home/bedrock_agentcore \ PYTHONPATH=/app \ PORT=8080 \ LOG_LEVEL=INFO \ # Required for Claude Agent SDK to use Bedrock CLAUDE_CODE_USE_BEDROCK=1 \ ANTHROPIC_MODEL=us.anthropic.claude-sonnet-4-20250514-v1:0# Install system dependenciesRUN apt-get update && apt-get install -y \ nodejs \ npm \ curl \ && rm -rf /var/lib/apt/lists/*# Install Claude Code CLI (required for claude-agent-sdk)RUN npm install -g @anthropic-ai/claude-code# Install Python dependenciesCOPY requirements.txt requirements.txtRUN uv pip install -r requirements.txt# Create non-root userRUN useradd -m -d /home/bedrock_agentcore -u 1000 bedrock_agentcoreRUN mkdir -p /home/bedrock_agentcore/.config && \ chown -R bedrock_agentcore:bedrock_agentcore /home/bedrock_agentcore# Copy application codeCOPY . .RUN chown -R bedrock_agentcore:bedrock_agentcore /appEXPOSE 8080HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \ CMD curl -f http://localhost:8080/ping || exit 1# Run as non-root userCMD ["runuser", "-u", "bedrock_agentcore", "--", \ "uvicorn", "src.server:app", \ "--host", "0.0.0.0", \ "--port", "8080", \ "--log-level", "info"]
Step 7: Create README.md
README.md
# CloudWatch Logging Demo AgentDemonstrates proper logging patterns for Skills in Bedrock AgentCore.## Key Concepts1. **Agent logs**: Use Python logging to stdout2. **Skill logs**: Use Python logging to stderr (stdout is for JSON results)## Local Development# Create virtual environmentpython -m venv venvsource venv/bin/activate# Install dependenciespip install -r requirements.txt# (Required for Bedrock backend)export CLAUDE_CODE_USE_BEDROCK=1# Run serverpython -m src.server## Testing# Health checkcurl http://localhost:8080/ping# Test greetingcurl -X POST http://localhost:8080/invocations \ -H "Content-Type: application/json" \ -d '{"prompt": "Please greet Alice"}'## Deployment to AgentCore# Step 1: Configure AgentCore (one-time setup)agentcore configure -e Dockerfile# Step 2: Launch the agentagentcore launch# Step 3: Test the deploymentagentcore invoke '{"prompt": "Please greet Alice"}'# The `agentcore launch` command automatically:# - Builds the Docker image from your Dockerfile# - Pushes it to Amazon ECR# - Deploys to AgentCore runtime# - Sets up CloudWatch logging automatically# Logs will appear in CloudWatch at:# - `/aws/bedrock-agentcore/runtimes/<agent_id>/standard-logs`
7. Best Practices and Troubleshooting
Best Practices
1. Consistent Log Format Use a consistent format across all components: TIMESTAMP — [COMPONENT:name] — LEVEL — MESSAGE
2025-01-15 10:30:45 - [SKILL:pdf_tools] - INFO - [START] Command: extract_text2025-01-15 10:30:46 - [SKILL:pdf_tools] - INFO - [END] Command: extract_text - SUCCESS - 1.234s
2. Operation Bracketing Always log the start and end of operations:
def my_operation(): start_time = time.time() logger.info(f"[START] my_operation") try: # Do work result = do_work() logger.info(f"[END] my_operation - SUCCESS - {time.time() - start_time:.3f}s") return result except Exception as e: logger.error(f"[END] my_operation - FAILED - {time.time() - start_time:.3f}s") raise
3. Structured Data in Logs Include relevant context in logs for easier debugging:
logger.info(f"[TOOL_CALL] tool={tool_name} input_size={len(input_data)} turn={turn_count}")
4. Log Levels Use appropriate log levels:
- DEBUG: Detailed diagnostic information
- INFO: General operational events
- WARNING: Unexpected but handled situations
- ERROR: Errors that need attention
5. Avoid Sensitive Data Never log sensitive information. Always truncate or mask API keys and secrets.
# Badlogger.info(f"Processing with API key: {api_key}")# Goodlogger.info(f"Processing with API key: {api_key[:4]}***")
Troubleshooting
Problem: Skill logs not appearing in CloudWatch
Symptoms* Agent logs appear, but Skill script’s internal debug logs (like [START], [END]) are missing.*
Cause* Skill script is logging to stdout instead of stderr.*
Solution* Ensure skill scripts log to stderr (not stdout):*
# In skill scripts, configure logging to stderrlogging.basicConfig( stream=sys.stderr, # CRITICAL: Must be stderr, not stdout level=logging.DEBUG, format='%(asctime)s - [SKILL:my_tool] - %(levelname)s - %(message)s')
Expected CloudWatch output when configured correctly:
[TOOL_RESULT] Turn 4 Tool Use ID: toolu_bdrk_xxx Is Error: False Result: {"file_path": "/tmp/doc.pdf", "success": true}2025-12-06 07:06:35 - [SKILL:file_tools] - INFO - [START] Command: download2025-12-06 07:06:35 - [SKILL:file_tools] - INFO - url: https://example.com/doc.pdf2025-12-06 07:06:35 - [SKILL:file_tools] - INFO - Downloading from: https://...2025-12-06 07:06:35 - [SKILL:file_tools] - INFO - Download complete: 42237 bytes2025-12-06 07:06:35 - [SKILL:file_tools] - INFO - [END] Command: download - SUCCESS2025-12-06 07:06:35 - [SKILL:file_tools] - INFO - Elapsed: 0.356s
Problem: Logs are truncated
Symptoms* Long log messages are cut off in CloudWatch.*
Cause* CloudWatch has a 256KB limit per log event.*
Solution* Create logic to limit max length of log*
Problem: Duplicate logs
Symptoms* Same log message appears multiple times.*
Cause* Multiple logging handlers attached to the same logger.*
Solution* Clear existing handlers before adding new ones:*
Viewing Logs in CloudWatch
Using AWS Console
- Navigate to CloudWatch > Log Management.
- Find your agent’s log group: /aws/bedrock-agentcore/runtimes/<agent_id>/standard-logs.
- Select the log stream for your time range.
- Use CloudWatch Insights for advanced queries.
CloudWatch Insights Queries
Search for specific operations:
fields @timestamp, @message| filter @message like /\[SKILL:pdf_tools\]/| sort @timestamp desc| limit 100
Find errors:
fields @timestamp, @message| filter @message like /ERROR/| sort @timestamp desc| limit 50
Calculate operation duration:
fields @timestamp, @message| filter @message like /\[END\]/| parse @message /elapsed=(?<duration>[0-9.]+)s/| stats avg(duration), max(duration), min(duration) by bin(1h)
Summary
This guide covered:
- Skills fundamentals: What they are, how they differ from MCP, and when to use each.
- ClaudeAgentSDK configuration: How to enable and use Skills in your agents.
- AgentCore specifics: Understanding the microVM environment and subprocess execution.
- Logging strategy: The critical pattern of stdout for results, stderr for logs.
- Complete example: A working project you can use as a template.
- Best practices: Patterns for maintainable, debuggable logging.
The key insight is that Skills execute as subprocesses in AgentCore, and their stdout is captured by Claude as the tool result. All logging must go to stderr to appear in CloudWatch without corrupting the JSON response.
By following these patterns, you can build observable, debuggable agents with comprehensive logging that works seamlessly in the Bedrock AgentCore environment.