๐ Part 3: Inside an AI Agent โ Tools, Memory, Planning & Execution (DevOps Edition)
By Darji โ Founder of AgentFlux
๐ Introduction: What Actually Powers an AI Agent?
In Part 1 and 2, we explored:
- Why DevOps needs AI agents
- What agents are
- How autonomy differs from automation
Now we go under the hood.
This article breaks down the internal components of an AI Agent designed for DevOps, SRE, and cloud operations โ the kind of agent powering AgentFlux.
Weโll explore:
- How agents use tools (kubectl, Terraform, CI APIsโฆ)
- How agents use memory for cont...
๐ Part 3: Inside an AI Agent โ Tools, Memory, Planning & Execution (DevOps Edition)
By Darji โ Founder of AgentFlux
๐ Introduction: What Actually Powers an AI Agent?
In Part 1 and 2, we explored:
- Why DevOps needs AI agents
- What agents are
- How autonomy differs from automation
Now we go under the hood.
This article breaks down the internal components of an AI Agent designed for DevOps, SRE, and cloud operations โ the kind of agent powering AgentFlux.
Weโll explore:
- How agents use tools (kubectl, Terraform, CI APIsโฆ)
- How agents use memory for context
- The planning loop that drives autonomous reasoning
- The execution layer that safely applies fixes
- Real DevOps examples with code
Letโs dive into how autonomous engineering actually works.

๐ง Section 1 โ The Four Core Systems Inside Every AI Agent
At a high level, an AI agent consists of:
Tools โ Memory โ Planning โ Execution
This is the โbrainโ of an operational agent.
๐ง 1. Tools: How Agents Interact with Real Systems
Tools are permissions that let the agent take action.
In DevOps, this means giving an agent controlled access to:
๐ ๏ธ Infrastructure Tools
- Terraform
- Pulumi
- AWS CLI
- Azure CLI
- GCP APIs
โธ๏ธ Kubernetes Tools
- kubectl
- Helm
- ArgoCD
๐ CI/CD Tools
- GitHub Actions API
- GitLab CI
- Jenkins REST API
๐ Observability Tools
- Prometheus
- Grafana
- ELK / OpenSearch
- Datadog
- New Relic
๐ฌ Communication Tools
- Slack
- Teams
- PagerDuty
If an agent doesnโt have tools, it becomes just a chatbot.
Tools turn intelligence โ action.
๐ง 2. Memory: What an Agent Remembers
Agents need memory to avoid repeating mistakes and to understand context.
Three DevOps memory types:
1๏ธโฃ Short-Term Memory (active context)
Used for:
- Current logs
- Incident timeline
- Current Kubernetes state
- This deploymentโs failures
Example:
{ "pod": "api-7d8f9", "error": "CrashLoopBackOff", "attempts": 3 }2๏ธโฃ Long-Term Memory
Stored patterns:
- Past incidents
- Common fixes
- Stable configurations
- Previous agent actions
Example:
Pattern recognized: Registry token expires every 12 hours. Suggested action: Preemptive refresh.
3๏ธโฃ Tool Memory
Execution history:
- What the agent changed
- What commands it ran
- What succeeded/failed
This prevents loops and dangerous retries.
๐งฉ 3. Planning: How AI Agents Think Before Acting
Planning is where agents decide:
- Whatโs happening?
- Whatโs the safest fix?
- What tools should be used?
- What are the side effects?
- Should a rollback be prepared?
- Should a human be notified?
Agent Planning Loop (Medium-Ready Diagram)
1. Interpret Observations 2. Generate Hypotheses 3. Evaluate Risk 4. Compare Possible Actions 5. Select Best Action 6. Execute or Escalate
๐ Example: Agent Analyzing Kubernetes Errors
Input logs:
Readiness probe failed after 3s timeout CrashLoopBackOff detected
Agent planning output:
{ "root_cause": "readiness probe too strict", "proposed_actions": [ "increase timeoutSeconds to 10", "restart deployment", "validate CPU/memory limits" ], "risk_score": 0.12 }The agent builds a multi-step plan โ not just a single action.
โ๏ธ 4. Execution: How Agents Apply Safe, Controlled Fixes
Execution is where the agent acts, using โtoolsโ.
Safe execution requires:
โ๏ธ Guardrails โ๏ธ Permission boundaries โ๏ธ Rollback strategy โ๏ธ Validations โ๏ธ Logging everything
Example: Agent Fixing a Terraform Drift
Agent Decision:
State drift detected: Autoscaler set to min=1, expected min=3 Apply terraform fix with plan validation.
Agent Action Code:
import subprocess
def terraform_plan(): return subprocess.run(["terraform", "plan"], capture_output=True)
def terraform_apply(): return subprocess.run(["terraform", "apply", "-auto-approve"])
plan = terraform_plan()
if "autoscaler" in plan.stdout.decode(): terraform_apply()
Execution is ALWAYS followed by:
- Monitoring
- Validation
- Final confirmation
Agents must close the loop.
๐ง Section 2 โ How All the Pieces Fit Together
Here is the full architecture diagram of an AI DevOps agent:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ OBSERVE โ โ Logs, Metrics, Events โ โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโ โ โโโโโโโโโโโโโผโโโโโโโโโโโโ โ MEMORY โ โ ST, LT, Tool Memory โ โโโโโโโโโโโโโฌโโโโโโโโโโโโ โ โโโโโโโโโโโโโผโโโโโโโโโโโโ โ PLANNING โ โ Risk, RCA, Actions โ โโโโโโโโโโโโโฌโโโโโโโโโโโโ โ โโโโโโโโโโโโโผโโโโโโโโโโโโ โ TOOLS โ โ Terraform, kubectlโฆ โ โโโโโโโโโโโโโฌโโโโโโโโโโโโ โ โโโโโโโโโโโโโผโโโโโโโโโโโโ โ EXECUTION โ โ Apply fix, rollback โ โโโโโโโโโโโโโโโโโโโโโโโโโโ
This architecture powers the autonomous behavior described in Part 1 & 2.
๐งช Section 3 โ Real Scenario: End-to-End Agent Flow
Scenario:
Production API is failing readiness probes.
Full Agent Flow:
Observe:
Readiness probe timeout: 3s โ failing CrashLoopBackOff
Analyze:
{ "root_cause": "probe too strict", "confidence": 0.91 }Decide:
- Patch deployment
- Increase timeout
- Restart pods
Act:
kubectl patch deployment api \ --patch '{"spec":{"template":{"spec":{"containers":[{"name":"api","readinessProbe":{"timeoutSeconds":10}}]}}}}'Validate:
- Pods stable
- Error logs stopped
- Latency normal
This is autonomous DevOps in action.
๐ฎ Section 4 โ The Future: Multi-Agent DevOps Teams
AgentFlux will evolve into a system where multiple agents collaborate:
Examples:
- Pipeline Agent โ detects CI issues
- Kubernetes Agent โ heals clusters
- Infra Agent โ fixes Terraform drift
- SLO Agent โ monitors reliability
- Security Agent โ patches vulnerabilities
These agents form a digital DevOps workforce.
๐งต Final Thoughts
AI agents donโt replace DevOps engineers. They replace:
- Manual debugging
- Noisy alerts
- Repetitive fixes
- Human fatigue
Engineers move towards:
โจ Architecture โจ Reliability engineering โจ Agent supervision โจ Strategic decision-making
This is the future weโre building with AgentFlux.
๐ Next Article: Part 4 โ Building a Real DevOps Agent (Step-by-Step Tutorial)
๐ Part 3: Inside an AI Agent โ Tools, Memory, Planning & Execution (DevOps Edition) was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.