How to use Claude Code for big tasks without turning your code to shit

I find myself using LLMs for coding in 4 specific ways:

Finding information
Rubber ducking
Generating snippets of code or documentation
Having it work for hours on a big task with minimal intervention

I find the first 3 to be very useful, especially now that web search is dead. But they are similar in that the LLM in on a tight leash, and everything still ultimately flows through my brain. Number 4 is different though; it gives Claude a lot of leeway to go do whatever it wants, without my babysitting.

I find myself using LLMs for coding in 4 specific ways:

Finding information
Rubber ducking
Generating snippets of code or documentation
Having it work for hours on a big task with minimal intervention

For a long time I’ve found the Claude-take-the-wheel style vibe coding to be an incredible waste of my time and sanity. For every “oh wow it worked” moment there were far to many “oh wow I just spent an hour sifting through garbage”. Even when it accomplishes the task, the LLM always injects entropy into the code. Do it enough and this eventually ends in an incoherent fever dream of code slop. The software equivalent of “man who is sitting on a couch but somehow also is the couch”. I found this video does a nice job demonstrating the anti-memetic nonsense you end up with.

Recently though, I started to worry that I had written it off too early. Some of my friends (who’s opinion I trust) seemed to be getting better results, and I can’t help but think back to the early days of Google search, where some people seemed to “get it” and others didn’t. Clearly it can do impressive things; it’s just a matter of

raising the odds of success, and
lowering the risk of wasted time on my part (I am pointedly ignoring the actually cost of token usage for now) So I committed to a full week of heavy Claude Code usage, and set out to have it solve some major to-do items I had been putting off for months

The specifics don’t matter too much here, but for context, some of what I had it do:

Research all the available on-device speech-to-text models with permissive licences
Demo the transcription speed of each one on an android device attached to the PC
Write a C wrapper for the best one (Moonshine) and build an embeddable dynamic library
Build this for iOS, Android, Linux, and macOS, and integrate it with my app code using the FFI
Build a Nim wrapper for the fdk-aac library
Integrate it with miniaudio, so I can play AAC audio and pipe the audio into Moonshine

Plus many other tasks around wiring these things up and getting them running. Collectively I would estimate that these things would have taken me a month, and much of it would have been painful, tedious work.

Despite a rocky start (I nearly gave up on day 1), I ended up very happy with the results. I landed some solid new features, and my code is not shit (at least no more than it was). So here are some of my findings and bits of advice for how to drive this thing.

Every task needs a clear entry and exit point. i.e. “Run the program with ./run_program.sh, and look for ‘module loaded successfully’ in the log”. Don’t let it just crawl through the code and decide when it thinks it’s done.
Put your time into the setup process and the review process, but not in between. Trying to steer Claude while it’s working means you’re investing time into an ephemeral state that you will as likely as not throw out later. If it goes wrong, just /clear, update the starting prompt, and go again. Once it goes wrong the context is usually too polluted anyway.
Always protect your own work with source control. That includes the work spent writing a prompt. It should always be trivial to wipe everything out, make some changes, and send Claude off again.
Keep the intersection of your code and Claude’s code as minimal as possible. For example, if I want it to write a new miniaudio decoder at aac_decoder.c, that’s the only file it’s allowed to touch. It might generate lots of tests and docs, but those go into claude_tests and claude_docs, never into the acutal test or docs directories. It might seem unintuitive, since you often do want testing and documentation for a new feature, but those things are first-order tasks that should be worked on directly. If you want tests, toss out all the garbage and have claude write a couple simple tests that you can actually review.
Observe a real result before you even look at the code. If you’re working on, say, an image processing feature, check the output image before reviewing anything. Seeing something actually work means you (probably) have correct code, even if it’s encased in slop. But if there’s no observable result, you’re risking your time sifting through code that could be nonsense.
Constrain the context and look for references. For example, the prompt may take the form of a document like this: “Refer to file1.c, file2.c, and project_description.md. The miniaudio source is at ./external/miniaudio. The FDK library is at ./external/fdk-aac. We’re going to be writing an integration similar to ./src/opus_decoder.c. The task is to...” The less “wander around and pull random stuff into the context” you can have it do, the better.
Set up minimal test projects. LLMs are pretty good at extracting something out of something else, so use a prompt like “Use <full_project_source> as a reference and create a minimal project that demonstrates feature X”. Then have it extend feature X without all the extra source clouding up it’s context.

Metaphorically, I think about any job given to Claude as having 3 dimensions. There’s the breadth of the task (roughly how many lines of code it will touch), the depth of the task (the complexity, the layers of abstraction needed, the decision making involved, etc.), and the time spent working on it. Those three axes define a cube, and the size of the cube is how much entropy I’m shoving into the project. Something like “Update all the imports to use the new source structure” is broad (will touch almost all the files) and potentially long-running, but it’s conceptually very simple, so the volume is low. “Simplify the codebase and create clean lines of abstraction” is conceptually deep, broad, and will take a long time. Huge entropy cube. So the idea is to shrink the cube when possible, and to only deal with the section of the cube you need (like aac_decoder.c but not ./aac_decoder_tests and ./aac_decoder_project_milestones).

Ultimately, I walked away from my week of heavy Claude usage without any kind of polarized opinion. It’s not a terrifying new intelligence machine on the cusp of AGI, but it’s also not useless grift. In certain contexts it’s a powerful tool that speeds up software development. It also has the potential to be a huge time sink and can absolutely ruin code. But after putting some time into it, my intuition has gotten much better about when to use it, how to use it, and when to leave it alone. My feeling is that an inexperienced developer is at risk of over using it, but an experienced developer may be at risk of under using it. I was the latter, and I’m very happy to have changed that.

Similar Posts