🚀 Part 3: Inside an AI Agent — Tools, Memory, Planning & Execution (DevOps Edition)
By Darji — Founder of AgentFlux
🔍 Introduction: What Actually Powers an AI Agent?
In Part 1 and 2, we explored:
- Why DevOps needs AI agents
- What agents are
- How autonomy differs from automation
Now we go under the hood.
This article breaks down the internal components of an AI Agent designed for DevOps, SRE, and cloud operations — the kind of agent powering AgentFlux.
We’ll explore:
- How agents use tools (kubectl, Terraform, CI APIs…)
- How agents use memory for cont...
🚀 Part 3: Inside an AI Agent — Tools, Memory, Planning & Execution (DevOps Edition)
By Darji — Founder of AgentFlux
🔍 Introduction: What Actually Powers an AI Agent?
In Part 1 and 2, we explored:
- Why DevOps needs AI agents
- What agents are
- How autonomy differs from automation
Now we go under the hood.
This article breaks down the internal components of an AI Agent designed for DevOps, SRE, and cloud operations — the kind of agent powering AgentFlux.
We’ll explore:
- How agents use tools (kubectl, Terraform, CI APIs…)
- How agents use memory for context
- The planning loop that drives autonomous reasoning
- The execution layer that safely applies fixes
- Real DevOps examples with code
Let’s dive into how autonomous engineering actually works.

🧠 Section 1 — The Four Core Systems Inside Every AI Agent
At a high level, an AI agent consists of:
Tools → Memory → Planning → Execution
This is the “brain” of an operational agent.
🔧 1. Tools: How Agents Interact with Real Systems
Tools are permissions that let the agent take action.
In DevOps, this means giving an agent controlled access to:
🛠️ Infrastructure Tools
- Terraform
- Pulumi
- AWS CLI
- Azure CLI
- GCP APIs
☸️ Kubernetes Tools
- kubectl
- Helm
- ArgoCD
🔁 CI/CD Tools
- GitHub Actions API
- GitLab CI
- Jenkins REST API
📊 Observability Tools
- Prometheus
- Grafana
- ELK / OpenSearch
- Datadog
- New Relic
💬 Communication Tools
- Slack
- Teams
- PagerDuty
If an agent doesn’t have tools, it becomes just a chatbot.
Tools turn intelligence → action.
🧠 2. Memory: What an Agent Remembers
Agents need memory to avoid repeating mistakes and to understand context.
Three DevOps memory types:
1️⃣ Short-Term Memory (active context)
Used for:
- Current logs
- Incident timeline
- Current Kubernetes state
- This deployment’s failures
Example:
{ "pod": "api-7d8f9", "error": "CrashLoopBackOff", "attempts": 3 }2️⃣ Long-Term Memory
Stored patterns:
- Past incidents
- Common fixes
- Stable configurations
- Previous agent actions
Example:
Pattern recognized: Registry token expires every 12 hours. Suggested action: Preemptive refresh.
3️⃣ Tool Memory
Execution history:
- What the agent changed
- What commands it ran
- What succeeded/failed
This prevents loops and dangerous retries.
🧩 3. Planning: How AI Agents Think Before Acting
Planning is where agents decide:
- What’s happening?
- What’s the safest fix?
- What tools should be used?
- What are the side effects?
- Should a rollback be prepared?
- Should a human be notified?
Agent Planning Loop (Medium-Ready Diagram)
1. Interpret Observations 2. Generate Hypotheses 3. Evaluate Risk 4. Compare Possible Actions 5. Select Best Action 6. Execute or Escalate
🔍 Example: Agent Analyzing Kubernetes Errors
Input logs:
Readiness probe failed after 3s timeout CrashLoopBackOff detected
Agent planning output:
{ "root_cause": "readiness probe too strict", "proposed_actions": [ "increase timeoutSeconds to 10", "restart deployment", "validate CPU/memory limits" ], "risk_score": 0.12 }The agent builds a multi-step plan — not just a single action.
⚙️ 4. Execution: How Agents Apply Safe, Controlled Fixes
Execution is where the agent acts, using “tools”.
Safe execution requires:
✔️ Guardrails ✔️ Permission boundaries ✔️ Rollback strategy ✔️ Validations ✔️ Logging everything
Example: Agent Fixing a Terraform Drift
Agent Decision:
State drift detected: Autoscaler set to min=1, expected min=3 Apply terraform fix with plan validation.
Agent Action Code:
import subprocess
def terraform_plan(): return subprocess.run(["terraform", "plan"], capture_output=True)
def terraform_apply(): return subprocess.run(["terraform", "apply", "-auto-approve"])
plan = terraform_plan()
if "autoscaler" in plan.stdout.decode(): terraform_apply()
Execution is ALWAYS followed by:
- Monitoring
- Validation
- Final confirmation
Agents must close the loop.
🧠 Section 2 — How All the Pieces Fit Together
Here is the full architecture diagram of an AI DevOps agent:
┌──────────────────────────┐ │ OBSERVE │ │ Logs, Metrics, Events │ └──────────────┬───────────┘ │ ┌───────────▼───────────┐ │ MEMORY │ │ ST, LT, Tool Memory │ └───────────┬───────────┘ │ ┌───────────▼───────────┐ │ PLANNING │ │ Risk, RCA, Actions │ └───────────┬───────────┘ │ ┌───────────▼───────────┐ │ TOOLS │ │ Terraform, kubectl… │ └───────────┬───────────┘ │ ┌───────────▼───────────┐ │ EXECUTION │ │ Apply fix, rollback │ └────────────────────────┘
This architecture powers the autonomous behavior described in Part 1 & 2.
🧪 Section 3 — Real Scenario: End-to-End Agent Flow
Scenario:
Production API is failing readiness probes.
Full Agent Flow:
Observe:
Readiness probe timeout: 3s → failing CrashLoopBackOff
Analyze:
{ "root_cause": "probe too strict", "confidence": 0.91 }Decide:
- Patch deployment
- Increase timeout
- Restart pods
Act:
kubectl patch deployment api \ --patch '{"spec":{"template":{"spec":{"containers":[{"name":"api","readinessProbe":{"timeoutSeconds":10}}]}}}}'Validate:
- Pods stable
- Error logs stopped
- Latency normal
This is autonomous DevOps in action.
🔮 Section 4 — The Future: Multi-Agent DevOps Teams
AgentFlux will evolve into a system where multiple agents collaborate:
Examples:
- Pipeline Agent → detects CI issues
- Kubernetes Agent → heals clusters
- Infra Agent → fixes Terraform drift
- SLO Agent → monitors reliability
- Security Agent → patches vulnerabilities
These agents form a digital DevOps workforce.
🧵 Final Thoughts
AI agents don’t replace DevOps engineers. They replace:
- Manual debugging
- Noisy alerts
- Repetitive fixes
- Human fatigue
Engineers move towards:
✨ Architecture ✨ Reliability engineering ✨ Agent supervision ✨ Strategic decision-making
This is the future we’re building with AgentFlux.
📌 Next Article: Part 4 — Building a Real DevOps Agent (Step-by-Step Tutorial)
🚀 Part 3: Inside an AI Agent — Tools, Memory, Planning & Execution (DevOps Edition) was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.