Where the Work Goes When Agents Arrive

February 2, 2026 7 minute read

A friend dropped this in our Slack the other day about Steve Yegge’s Gas Town:

I think Gas Town is directionally correct but you have to admit also unhinged in a bunch of ways. I can’t help but be attracted to the absolute gonzo yolo mentality that birthed it, but I wouldn’t want to live there.

It’s been rattling around in my head since. He’s right on both counts. And I say that as someone who actually lives there.

The main thing I keep noticing isn’t the tools. It’s where the work goes.

Over the last six months living in Gas Town, the biggest shift hasn’t been speed, it’s attention. This post is about how that attention moves upstream into design and guardrails, and how the feedback loops change once you stop reading every line.

I Live in…

February 2, 2026 7 minute read

A friend dropped this in our Slack the other day about Steve Yegge’s Gas Town:

I think Gas Town is directionally correct but you have to admit also unhinged in a bunch of ways. I can’t help but be attracted to the absolute gonzo yolo mentality that birthed it, but I wouldn’t want to live there.

It’s been rattling around in my head since. He’s right on both counts. And I say that as someone who actually lives there.

The main thing I keep noticing isn’t the tools. It’s where the work goes.

I Live in Gas Town

What Gas Town Is

For the past six months I’ve been experimenting with Gas Town and other ways to orchestrate AI-driven development, both for work and personal projects. Gas Town is Yegge’s multi-agent orchestration system for Claude Code. It coordinates 20-30 agents working on tasks simultaneously. Mad Max-themed terminology: a Mayor agent you talk to, Rigs for projects, Polecats as ephemeral workers. It’s a machine for spending hundreds of dollars a day on Claude API calls while you focus on feeding it design work.

The pitch: agents lose context on restart, so Gas Town persists work in git-backed hooks. You tell the Mayor what you want, and it orchestrates implementation across multiple agents while you go do something else. Throughput at the speed of thought.

Why I Live There

Is it gonzo? Absolutely. Would I want to live there? I already do.

The Trade: Distance Is the Gain

The trade is simple. AI-assisted coding has increased my productivity and the distance between me and the code. These aren’t separate effects. The productivity gain is the distance. You can’t have one without the other.

I no longer read every line that gets written. I don’t even read most lines. That used to bother me until I realized the work had moved.

Two Modes of Attention

Looking back at how I produce code now, it breaks into two distinct modes:

Autopilot Mode (70%)

Small features, bug fixes, routine changes. I describe what I want, Claude implements it, CI runs, it merges. I give little attention to implementation details.

This sounds reckless until you consider what’s actually in place. Unit tests with decent coverage. Linters catching style issues. Formatters keeping things consistent. Type checking catching obvious mistakes. CI gates preventing broken code from merging. Pre-commit hooks running gitleaks for secrets.

Some of these workflows have solidified enough that I’ve codified them into Claude commands, skills, and custom prompts. Others are still experimental, patterns I use daily but haven’t formalized yet. As I notice approaches working reliably, I encode them. The codified workflows compound: each one means more work I can safely put on autopilot.

Deliberate Mode (30%)

New authentication systems. Terraform infrastructure. Database schema changes. Architectural decisions that will shape hundreds of subsequent changes.

This is where I slow down. A lot. I brainstorm with Claude before any code gets written. We discuss tradeoffs. I sketch designs. I ask Claude to poke holes in my thinking. Only then does implementation happen, and I read that code carefully before merging.

The goal isn’t to review every line. It’s to make sure the patterns and abstractions are right. If the foundation is solid, all the autopilot work that lands on top of it will be decent. If the foundation is broken, no amount of careful review on individual features will save you.

Failure Rate Snapshot

Across both modes, failures still show up. I pulled numbers from my logs to keep myself honest. Those are below.

A Quick Analysis

Numbers From the Logs

After the brain dump, I looked at my own session logs for January 4, 2026 through February 3, 2026 and pulled some basic numbers. I wanted a reality check before making claims.

Sessions scanned: 1,130 total, 1,058 Claude and 72 Codex
User prompts: 37,562
Median session length was under a minute. P90 was around 25 minutes. P95 was about 100 minutes.
Median prompts per session was 3. P90 was about 63.

Failure signals showed up in about 16% of sessions. When failures happened, they were mostly CI related. About three quarters of failure prompts mentioned CI. Fewer than one in ten mentioned pre-commit. Manual review was a small slice.

What the Numbers Point To

That points to two things. The tightest feedback loop is pre-commit, and agents fix those errors quickly. The weaker loop is CI, which fires later and is less visible unless I am actively watching the pull request. I think that is where the next productivity jump is. I need an automated handoff that watches CI until merge and reacts when it breaks. That connects directly to Steve Yegge’s Gas Town merge queue and refinery idea. If I can automate that step reliably, the whole system gets smoother.

I’m still figuring out how to run workflows reliably. In any reasonably large project there’s a chain from inspiration to requirements to design to implementation to landing. Landing means tests, lint, CI, deploys, and the integration environment still behaving. That’s where things go sideways. A lot of my learning in the last months has been about prompting and using deterministic tools, linters, tests, CI, so agents keep moving in the right direction.

Failure Modes & the Learning Budget

I discover new failure modes constantly. Claude gets confused by monorepos with multiple package.json files. It invents API endpoints that don’t exist. It confidently refactors code in ways that technically work but violate conventions I never explicitly stated.

Each failure teaches me something: a new instruction to add to CLAUDE.md, a test case I should’ve written, a pattern I should’ve established earlier. This is the learning budget, the cost of pushing boundaries in a space that’s still evolving rapidly.

I’m excited by it, and I’m not pretending I’ve figured it out. I’m sharing where I am right now, six months into living in a full-time multi-agent, herd-the-agents workflow. Gas Town is just one influence.

Feedback Loops

Deterministic vs. Fuzzy Loops

What actually keeps quality stable is feedback loops that build confidence.

Some loops are deterministic: tests, linters, type checks, CI gates, pre-commit hooks. Other loops are fuzzier: prompt constraints, checklists, review rituals, and the explicit constraints I hand to agents. The confidence comes from overlap. When multiple signals agree, I move faster. When they disagree, I slow down and tighten the loop.

If your loops are thin, no tests, no types, weak CI, you cannot put anything on autopilot. You end up reviewing every line because that is your only quality gate.

What I Want to Measure Next

There is also an open question I want to dig into later: observability for these loops. I want metrics on which loop actually caught the failure.

How many times did the agent hit a unit test failure in pre-commit?
How many times did CI fail a pull request because earlier stages missed it?

If you have ideas here or have built something along these lines, let me know. I’m curious to learn.

Operating Loops I Use

The concept is concrete in the day-to-day. A few operating loops have become routine:

Recon pass. I ask an agent to scan a neglected area, map what’s off, and propose a small set of changes. Sometimes it turns into a feature, often it’s just debt.
Targeted sweep. Pick one pattern that’s been bothering me and update it everywhere in one pass.
Codify the pain. If I ask for the same thing twice, I turn it into a skill or a command. If it’s deterministic, I drop to a script.
Guardrail tuning. When something fails, I tighten prompts or CI gates before the next run.

These aren’t planned work. They’re opportunistic and show up when I have spare cycles.

I’ve also noticed a new loop: solve the thing, then codify it, then move on. If a task repeats twice, I start thinking about the skill or command that would make it deterministic. That impulse didn’t exist for me before. Now it shows up almost automatically.

What Changed for Me

The New Relationship with the Codebase

It’s a different relationship with a codebase. Less intimate, more managerial. I know what’s there at an architectural level, but I don’t have line-by-line familiarity anymore. And I’m okay with that.

If anything here sticks for me, it’s where the work went: up front into design and guardrails, then back into review loops. Everything else is still moving.

Delegation Isn’t Reckless (If Guardrails Exist)

Gas Town does feel gonzo. But the underlying intuition, that we can trust automated systems more and human review less, isn’t actually crazy. We’ve been moving in that direction for decades. Compilers, type checkers, linters, CI pipelines, code review tools. AI agents are just the next layer.

Delegating to autopilot isn’t reckless if you’ve built the infrastructure to catch falls. It’s just acknowledging where the actual value of human attention lies.

Cheers.