Vibe Coding
I have been teaching myself to vibe code.
Back in 2009 I posted a simple Mandelbrot fractal viewer on the web: a single HTML file with inline Javascript. Just 329 lines of code, each pixel a tiny table cell. Click to zoom. Watch it iterate. That was about it!
I have wondered if improving the page could raise it in the Google rankings, so I have been using code LMs to make a number of improvements....
Two Kinds of Vibe Coding
There are two categories of vibe coding. One is when you delegate little tasks to a coding LM while keeping yourself as the human "real programmer" fully informed and in control.
The second type of vibe coding is what I am int…
Vibe Coding
I have been teaching myself to vibe code.
Back in 2009 I posted a simple Mandelbrot fractal viewer on the web: a single HTML file with inline Javascript. Just 329 lines of code, each pixel a tiny table cell. Click to zoom. Watch it iterate. That was about it!
I have wondered if improving the page could raise it in the Google rankings, so I have been using code LMs to make a number of improvements....
Two Kinds of Vibe Coding
There are two categories of vibe coding. One is when you delegate little tasks to a coding LM while keeping yourself as the human "real programmer" fully informed and in control.
The second type of vibe coding is what I am interested in. It is when you use a coding agent to build towers of complexity that go beyond what you have time to understand in any detail. I am interested in what it means to cede cognitive control to an AI. My friend David Maymudes has been building some serious software that way, and he compares the second type of vibe coding to managing a big software team. Like the stories you’ve heard of whole startups being created completely out of one person vibe coding.
When my students describe their own vibe coding, it sounds like the first kind. That is also how I started with my Mandelbrot project, pasting little algorithm nuggets into my code while I edited each function, making all the key decisions using my own human judgement.
But in the last couple weeks I have put on the blinders. I have resolved to stop looking at all the code in detail anymore. Instead, I am pushing the agent to write a ton of code, make its own decisions, to hell with the complexity. I have been experimenting with the second kind of vibe.
It is working surprisingly well, and I have been thinking about my experience handing the reins to an AI agent. The workflow may presage the use of generative AI across other industries. And I have arrived at two basic rules for vibe coders.
Unleashing Claude on my Webpage
The last human-written version of the webpage without LLM assistance was 780 lines; you can see its 37 functions diagrammed below. It is a nice elegant piece of code, but pretty simplistic as a fractal implementation.
A key litmus test for a fractal viewer is how deep and fast it goes. By these measures, my human-written program was amateurish. Here is a picture of the output of the 780-line version at 0.4061675367769961+0.1457621521782999i, after running for 30 minutes zoomed by 15 orders of magnitude. It is badly pixelated, because the 1015 scale exceeds the limits of Javascript’s 64-bit precision floating point numbers. And it is pretty slow: if you click below, you can see that it takes several minutes just to get the first pixels, working one main browser thread, pausing when you click to a different tab.
Compare to how the LLM-assisted version renders the following image, after just one minute of work, at the same location and zoom level:
The LLM version is much faster because it uses the GPU (if your web browser allows it). But it plays many more tricks than just moving the calculation from CPU to GPU, because although the GPU is hundreds of times faster than a CPU, its 7-digit fp32 is also millions of times coarser than the CPU’s 15-digit fp64. So the LLM-generated program deals with this by implementing perturbation algorithms to split the work between CPU and GPU, calculating numbers as (z+d·2s) where z is a sparse high-resolution vector on the (slow but precise) CPU and d and s are dense near-zero low-resolution vectors on the (fast but imprecise) GPU.
There are multiple ways to implement perturbation algorithms, and so the LLM code implements and benchmarks nine alternative approaches, selecting different algorithms at different zoom levels and compute availability to follow the Pareto frontier of time/resolution tradeoff. Backing the algorithms it has written quad-double precision arithmetic accurate to 60+ digits, an adaptive float32+logscale numeric complex representation, GPU buffer management, and a task scheduler that can serialize and migrate long-running CPU tasks between WebWorker threads. It has also added many other UI details I asked for, like a minimal MP4 encoder for recoding smooth movies and a cache to reduce recalculation when using the browser’s forward/back history. The little webpage includes implementations of Horner’s algorithm for stable polynomials, Fibonacci series for aperiodic periodicity checks, Catmull-Rom splines for smooth animations, continued fractions for pretty ratios, spatial hashing for finding nearby points, an algorithm for rebasing iterated perturbations that it found in a 2021 forum post, and a novel algorithm for fast orbit detection it developed based on my suggestion. All with detailed documentation and a search-engine-optimized internationalized user interface explained in the most commonly-read eleven languages on the Internet. That last part, with all the translations to Chinese and Arabic, took Claude just a few minutes while I was cooking breakfast.
The cost of this performance? A large increase in complexity. Empowered to make direct changes in the project, Claude Code has now made several hundred commits, expanding the tiny one-page HTML file to more than 13,600 lines of code, defining 30 classes, 2 mixins, 342 methods, and 159 functions.
That brings me to the rules for getting an LLM agent to work effectively: David’s two rules for vibe coding. They are simple rules.
Rule 1: Automate tests
If you just ask the agent to solve a problem, it will run around for a few minutes and come back with a rough solution. Then you test it, find it doesn’t work, tell it so, and it runs around again for another five minutes. Repeat.
This workflow turns you into the manual tester. Maybe the least interesting job on the planet. Not a good use of precious human brain cells.
But if you get the agent to write a good automated test first, something changes. After it runs around for five minutes, it remembers to check its own work. It sees how it got things wrong. It goes back and tries again. Now it can extend its horizon to 30 minutes of autonomous work. By the time it comes to bother you, the result is much more promising.
Rule 2: Test the tests
But after a while, you realize the 30-minute interrupts are only a bit better than the 5-minute ones. The agent is good at finding holes in your tests. It produces stupid solutions that don’t do what you want but still pass, because the tests were not actually good enough.
So: test the tests.
Testing tests is the kind of thankless metaprogramming that a development manager spends all their time doing, to make their team productive. For example: fuzz testing to discover new problems that need systematic tests. Code coverage to reveal what code exists but remains untested. Frameworks to make code more testable, for enabling benchmarking, for enabling troubleshooting. Hypothesis-driven testing to force the agent to form a theory about what might be wrong, then write tests that chase it down. This type of programming is the sort of painful chore that can unlock productivity in a software development team. And it works very well when vibe coding also.
It is interesting that it can be hard to get a coding agent to understand why it is spending so much effort testing tests. For example, when getting Claude Code to construct a reliable code coverage framework, I gave it the mission of debugging why its initial attempt had produced the unbelievable (and untrue) assertion that 100% of lines had been covered by tests. Claude understood what it was trying to do at first, but when the going got tough, it kept giving up, saying "we don’t need to do anything here; I just noticed, code coverage is already 100%!" Maybe testing its tests of the tests is too meta, just at the edge of its ability to follow.
But once you can get the metaprogramming right, and do it well, you can reach a kind of vibe coding nirvana. Because then, as a human, you can look at code again! Instead of facing thousands of jumbled lines vomited up by the agent, now you’ve got maps of the code, informed by code coverage, benchmarks, test harnesses, and other instrumentation. It lets you skip thinking about the 99% of routine code and focus your attention on the 1% most interesting code. The weird cases, the edge cases, the stuff that might deserve to be changed. That is a good use of precious human brain cells.
One limitation of this vibe approach is that tests catch bugs but not bloat. After developing comprehensive testing, I did find it helpful to make one human pass over the code to find opportunities for making code more symmetric (to make code near-duplication more obvious), and to remove some confusing code that was leading the agent astray. That opened the way for larger-scale vibe-coded refactoring that improved the elegance of the most intricate part of the code.
The two rules are not just coding hacks. They also reveal a path for keeping humans informed enough to remain in control.
Trucks and Pedestrians
My experience vibe coding reminds me of the difference between walking and driving a truck. Highway driving is a new skill, but with a truck you can haul a lot more stuff faster than you could dream of on foot. Vibe working with AI gets you out of the business of actual intellectual hauling. Instead it gets you into the business of taking care of the machine.
Working effectively with AI is much more abstract than traditional office work, because it demands that we build up meta-cognitive infrastructure, like the 422 automated tests and code coverage tools that I needed to effectively steer the development of my single webpage.
As we reshape the global economy around AI, it reminds me of the construction of the American interstate highway system. The speeding and scaling of cognition seems likely to lead to economy-wide boosts in "intellectual mobility," and a whole new culture with the equivalent of roadside service stations and even suburban flight. But it also strikes me that we do not want to live in a world where all decisions are made by large-scale AI, no more than we would want to live in a world where everyone gets everywhere in a truck. Our modern streets are too congested with dangerous vehicles, and I am not sure it is giving us the best life.
I like walking to work.
As AI edges humans out of the business of thinking, I think we need to be wary of losing something essential about being human. By making our world more complex—twenty times more lines of code!—we risk losing touch with our ability to understand the world in ways that dull our ability to make good decisions, that prevent us from even understanding what it is that we want in the world. I hope we can build metacognitive infrastructure that keeps our human minds informed. But as we build increasingly powerful abstractions, it will be both crucial and difficult to keep asking: Do we want this?
If you would like a sense of the structure and volume of code produced by vibe coding, you can scroll through the vibe-coded visualization of the evolution of the code through git commits. Or compare the code before (pre-LLM code repo here) and after (current code repo). In particular, read the agent’s documentation of the development infrastructure it built for this little one-webpage project. That kind of tooling will be familiar to anybody who has worked on a large engineering team. And it is the kind of work needed to support human comprehension of complexity in the age of LLM agents.
Posted by David at December 16, 2025 11:15 AM