What Kind of Programming Is Natural Language Programming?

In previous posts I’ve written about Dijkstra’s Ghost and Ephemeral Editable Specifications (aka Extract, Edit, Apply), touching on the topics of Natural Language Programming and the role of Specifications in AI-native programming.

Today I’d like to step back and address an underlying question: what kindof programming is Natural Language Programming? And how does it relate to Specification-oriented programming? What kinds of natural language programming are viable and what maybe aren’t? What are the costs and tradeoffs? What are the limits of applicability, and how can we teach our students what natural language programming is/isn’t good for?

First – and this is f…

First – and this is fundamental – I want to emphasise that I see Natural Language Programming as more akin to constraint programming than to traditional, precise imperative/functional/object programming. That is, well written, clear natural language programming does things such as

shapes possible outcomes
lays down goals
indicates paths to forward direction
channels towards success
makes the intent, bounds, purpose clear I’m not just saying this is how LLMs are guided. I’m saying that this is more intrinsic to how natural language works. Words are, imprecise, yes. But they are also constraints that shape and guide.

For me the most interesting uses of natural language programming are for highly ambiguous orchestration programming. Let’s take an example of a natural language program in this space:

Concierge Robot, get me to the airport on time for every flight I need to take this year

This is undoubtedly a program, as it can be submitted to an automated system (an LLM with tools, obviously) and, if the system is equipped with the right capabilities such as the ability to book an Uber, look up flight schedules, look up traffic conditions, access your calendar, listen to events, ask you questions, read your text messages, watch the clock etc., then the system will produce useful and reliable results. (If you’re interested in other programs somewhat akin to this, and more relevant to developers, you might like to look at the CI Doctor or Daily Test Improver or Daily Adhoc QA or PR Fix workflows, all of which are hostable as interpreted natural language programs in GitHub Actions through GitHub Agentic Workflows, the interpretation being done by a coding agent equipped with an LLM and tools)

Now, a natural language program like this is highly ambiguous. As I argued previously, computer scientists love precision, and worship it as the very definition of programming, For certain tribes within the broad church of computer science, if you even go near natural language programming, Dijkstra’s Ghost will visit you and haunt your dreams. You will have committed not just The Great Heresy of Imprecision, but also apostasy from The Symbolic Supremacy – the apostasy of not finding symbolicand logic-basedsolutions to information representation, of abandoning the tools of logic, proof and mathematics.

Further, for people considering “Specification-oriented programming” (more on that below), that ambiguity can be seen as an inherent problem. If not approached carefully there is a risk of falling into problematic dynamic where advocates argue that “natural language can be precise” or “it’s possible to do regular precise programming in natural language”. They might, for example, argue that the spec just needs to be “fleshed-out” and “made more detailed” until it is really, really precise. While this is true, the risk is that this needs to be done to the point where any utility gained from natural language disappears.

However, I would step back, and argue that considering ambiguity to be an inherent problem is mistaken. A natural language program like the one above is not designed to be precise. Its very power and appeal is in its ambiguity – that is, what we call ambiguity is actually often generality. In the example above, we don’t apriori know what times the flights are, what airports they’re at, we don’t know if they’ll be cancelled, we don’t know what the traffic conditions will be, we can’t tell whether a train or car or bus is needed. A huge, complex decision-making process must be performed. The incredible power of modern AI systems is that they can perform this decision-making, taking in a huge range of inputs and dealing with many information sources, and pretty reasonable decision-making logic comes out. That is, as written the program is very general, and indeed part of that generality is robustness: it’s able to cope with endless side conditions, error conditions, text messages from your Uber driver, whatever. That generality comes through what many humans would consider to be its ambiguity or imprecision.

Now, should this ambiguity be reduced? On the whole, no! And again no! Adding more detail to the specification – to reduce or eliminate the ambiguity of the natural language program – would along most dimensions (transport, traffic, cancellation, weather etc.) almost certainly be unhelpful, unless the human has a particular preference once way or another. Further, compilation of natural language programs is of little intrinsicuse unless necessary for execution: the human derives no intrinsic utility from concretizing the program to a set of possible sequences, decision graphs, transport choices, timetable searches, reactions to events, traffic algorithms, rerouting criteria, programmatic code etc is almost completely useless. That’s the robot’s job, and probably best interpreted on-demand given how fast things can change. Generating code for this natural language program via compilation is almost certainly useless and/or impossible: at best a bit of pre-research may be useful, prebaking a set of searches, tools and possibilities as common-paths, getting ready for the task of being a good Concierge Robot. But the program is not one that benefits from being compiled to code, unless you really need to for cost or efficiency.

So, in practice, most ambiguity in natural language programs will actually be genuinely useful generality. However, it can also be possible that the natural language program is ambiguous in some ways which it might be genuinely important to clarify up front. For example, in the example above, the program doesn’t say anything about “budget” – and it’s not clear information on that will be available in tools that are available to the robot (Concierge Robots aren’t generally privy to the financial position of their users). Because budget fundamentally affects critial decision making – and this can be understood ahead of time – then adding or negotiating new constraints about budget would be useful elaboration steps for the natural language program. Adding information about budget would make the NL program better, more reliable in allcircumstances.

So what we generally call “ambiguity” comes in two very different flavours:

genuinely useful generality
missing useful specificity Focusing on the first category, we must say this loud and clear: ** ambiguity in natural language programming is not always a problem, it’s often an opportunity**. Indeed generality through useful ambiguity is the very essence of natural language progamming. Benefiting from that ambiguity today requires high costs of using LLM agentic interpretation. But it’s darn useful when the costs work out ok.

This means natural language programming is a vastly different thing to regular, precise programming. The art is often in specifying less, not more. Natural language programs can be made worse by specifying more.

Given this, my position is that natural language programming should not be used to achieve the same goals – the same kinds of programming – as traditional, precise programming languages. If you’re trying to achieve the same thing with specs or natural language as with traditional, precise programming languages, at the same level of precision, then it’s like fitting a square peg in a round hole. It’s a very likely category mistake.

For me, natural language programming is most useful in three situations:

Orchestration workflows. This uses interpreted natural language as a constraint language for very high value orchestration flows. For example, the GitHub Agentic Workflows we’re working on. For example, consider the Daily Test Coverage Improver workflow program. It’s written in natural language. There’s huge value in navigating ambiguity there. And $10/run is actually worth it.
**Guided generation.**This uses natural language specifications as semi-ephemeral inputs for compilation or code generation, for example a Product Requirements Document that guides an entire phase of application creation, but may later be archived. An example is the specificifications fed into App Dev toolchains such as GitHub Spark, Lovable, Nectry or Cogna. Another example is the Spec Kit toolchain for Specification-oriented programming. The spec is then usually paired with its elaboration (“shadow spec”), and also the ultimate generated code and the tool that created it. In most these approaches the specification may later be discarded in long term maintenance. This means the specification has a longish life, but not necessarily a permanent one. These tools and methodologies seem most useful for coding domains where there is an awful lot of boilerplate and where the instabilty of guided generation is minimal and can be tolerated in return for the productivity benefits. If guided generation requires adding unproductive detail to the spec, that is a problem.
Task-oriented programming. This uses natural language for highly-ephmeral tasks for traditional code change and generation, i.e. vibe coding These usage points may change as costs for interpreting natural language reduce, and reliability of interpretation increases.

Natural Language Programming and Specification-oriented Programming are both vey powerful and very real. People who are arguing these things are not real, or do not exist, are simply in denial, stuck with an ideological view of programming that means programming has to be precise – and probably these people have little to no experience of constraint programming. However Natural Language Programming is not precise programming, and mistaking it for precise programming is a category mistake. Natural Language Programming is better seen as a form of constraint programming. It will be a hugely important part of the working lives of millions of developers and many others over the coming years.

p.s. Some of the concepts in this blog post were inspired by Chris Bora’s Intent Science . Read carefully, it’s a truly remarkable and unique manifesto suited to our times.

Similar Posts