Coding Agents Are Outliers

Nov 2, 2025

If you look at the agent landscape today, one of the most prominent success stories has been software coding agents, particularly the quantum leap since the CLI coding agents came out, starting with Claude Code in early 2025. While people were certainly using LLMs for various coding tasks before these CLI agents, this new form factor offered a compelling set of functionality not available before. They can write complex code, debug tricky issues, refactor entire codebases, and even deploy applications.

But in this piece I want to convince you that coding agents are actually outliers when it comes to designing and building AI agents. The very characteristics that make coding a perfect domain for AI agents—deterministic outputs, clear success and failure states, rich tooling…

Nov 2, 2025

The success of coding agents has created a template that everyone seems eager to copy. But this template is misleading. What works for coding agents doesn’t necessarily work elsewhere, and understanding why is crucial for anyone trying to build AI agents in other domains.

Building your own tools

The people who build AI models are themselves programmers and computer scientists. Programming is how they build their products. This creates a unique dynamic that doesn’t exist in any other domain.

When researchers at the major model providers improve their models’ coding abilities, they immediately become the first line of users. The same (or very adjacent) teams build the agentic scaffolds around these models. They use these tools to write the very code that improves the models. They experience every frustration, every limitation, and every breakthrough firsthand. This isn’t just dogfooding—it’s a tight feedback loop where the builders’ own productivity depends on the quality of what they’re building. This feedback loop goes straight back into model development. No other domain has this level of builder-user overlap, and it shows in the results.

Beyond this human feedback loop, there’s the matter of training data. Programming is very well represented in the datasets used to train large language models. Code repositories, technical documentation, tutorials, blog posts—the internet is awash with programming content.

But it’s not just the code itself. The entire ecosystem of software development is documented online: testing practices, CI/CD pipelines, debugging strategies, architectural patterns, code reviews, post-mortems. This world knowledge makes coding agents not just good at writing code—they understand the full software development lifecycle because all of this knowledge is in their training data. They know that after writing code, you should execute tests. They know what test-driven development means. They know how to set up deployment pipelines. They understand version control. This depth of representation is unique to programming.

Compare this to other highly specialized technical domains—material science, drug discovery, energy, manufacturing etc. Unlike programming, there is very little public world knowledge explaining and documenting the day-to-day work tackled by a professional in such fields.

Agents Versus Workflows

Slight tangent: we need to clarify the distinction between workflows and agents.

Workflows are predetermined sequences of steps designed to accomplish specific goals. When you use a workflow, you know exactly what steps will be taken and in what order. You might use AI or LLMs to help execute individual steps, but the sequence itself is fixed. The model isn’t deciding what to do next; it’s just helping you do it better. Workflows are perfect for repetitive, well-defined tasks where consistency and predictability matter more than adaptability.

Agents, on the other hand, perform reasoning and planning. They look at a problem, consider various approaches, and generate their own sequences of steps to solve it. They adapt their approach based on intermediate results. When something doesn’t work, they try something else. They might ask for clarification or suggest alternative strategies. True agents don’t just execute; they plan and adapt.

Coding agents are firmly on the agentic side of this spectrum. In fact, most coding agents have a special affordance for “plan mode”. The extensive programming-related world knowledge baked into the models lets them look at every unique programming problem they are given and improvise a high-level plan to tackle it.

However, most domains that people are trying to apply agent architectures to actually just need better workflows. Not every problem benefits from dynamic planning and reasoning. Sometimes, what you need is a well-designed sequence of steps executed consistently. (What precisely is the role of an LLM in such a workflow? That is a topic for another post.)

The Tacit Knowledge Problem

The real reason coding agents are outliers becomes clear when you look at other specialized domains. In fields outside programming, most of the knowledge needed to operate effectively isn’t written down anywhere, public or not. It’s tacit knowledge, locked inside the heads of experts and professionals. They notice subtle patterns, remember similar cases, and apply intuition. This knowledge is contextual, situational, and often very hard to fully articulate.

Building agents for these domains requires solving this tacit knowledge problem. You need to somehow extract knowledge that experts themselves often can’t articulate, codify it in a way that preserves its contextual nuance, and then ground an AI model on it. This is fundamentally harder than leveraging the already-documented world of programming knowledge.

Codifying this tacit knowledge and workflows, and providing it to a model to ground on is exactly the core playbook for the “AI Agent for X” startups (X could be lawyers, accountants etc). This is not easy. You either need to be a domain expert, or else embed yourself with them for an extended period of time. There are no shortcuts. (Some ways to do that is yet another topic for another post)

Why Success Doesn’t Translate

The success of coding agents has created unrealistic expectations for AI agents in other domains. We see these impressive coding demonstrations—an agent building an entire web app from a prompt, fixing complex bugs, refactoring legacy code—and assume similar breakthroughs are just around the corner for other fields.

But the conditions that enable coding agents are unique, not universal. Programming has deterministic outcomes: code either works or it doesn’t. It has immediate feedback: run it and its tests and see what happens. It has rich tooling: debuggers, profilers, test suites. It has clear success metrics: tests pass, performance improves, bugs are fixed. And most importantly, the people building the models are programmers and use the models for programming.

Most other domains lack these characteristics. Success in marketing is measured in customer engagement months later. Legal advice quality might only become clear after a case concludes years hence. Medical treatment effectiveness varies by patient in ways that are hard to predict. Design quality is subjective and context-dependent.

This doesn’t mean we can’t build useful AI agents for these domains. But it does mean we need to adjust our expectations and our approaches. The coding agent template is something to aspire towards, but not a given in other domains.

Building your own tools

Agents Versus Workflows

The Tacit Knowledge Problem

Why Success Doesn’t Translate

Similar Posts