19 December 2025·1502 words·8 mins
10 years ago I was really into privacy and encryption, and I wanted to run my own VPN server. At the time, OpenVPN was the standard open-source VPN software, and that’s how openvpn-install came to be.
Then WireGuard came along, and I created another script: wireguard-install. Over the years, these projects have been quite popular: currently 15k and 10k stars respectively, tens of thousands of repo visits per month!
The backlog problem #

The thing is, all these people have…
19 December 2025·1502 words·8 mins
10 years ago I was really into privacy and encryption, and I wanted to run my own VPN server. At the time, OpenVPN was the standard open-source VPN software, and that’s how openvpn-install came to be.
Then WireGuard came along, and I created another script: wireguard-install. Over the years, these projects have been quite popular: currently 15k and 10k stars respectively, tens of thousands of repo visits per month!
The backlog problem #

The thing is, all these people have many different use cases and encounter edge cases that are not handled. Over the years, this means a lot of issues and pull requests.
But the more they pile up, the harder it gets to manage. I tried to keep good triage hygiene by closing low-effort issues or converting to GitHub discussions when it’s a general support question, adding labels, etc.
The problem with actually tackling the backlog is that it takes a lot of time. I tried a few times but never saw the end of it.
For each issue, you have to:
-
understand if it’s relevant
-
maybe I merged something that fixed the issue since it was opened?
-
maybe I procrastinated so much that the OpenVPN or OS version this happened on isn’t even available anymore?
-
for a feature, understand if it’s doable, if it’s in scope, if it’s even possible
And for PRs, I need to do the same and on top of that I need to:
- test the changes manually. That means spinning up a VM in a cloud provider, running the install script, usually across different distros.
- rebase, fix conflicts, etc.
Which is a lot! But this time, it was going to be different.
Adding tests for velocity #
One of the first things I did was to plan for future velocity. My only automated validation on PRs was linting with shellcheck, basically. Back in 2020, I felt the need for actual testing, and I set up a GitHub Actions workflow to run the script on a few DigitalOcean droplets. This was helpful to catch some errors, but it only ran on master because I can’t make it run on PRs and forks for safety reasons (and cost).
I didn’t think I would be able to run that in GitHub Actions itself due to Docker limitations (OpenVPN requires the TUN module, IP routing, iptables, etc). But with Claude I was able to experiment and get a full end-to-end testing workflow working in Docker! Turns out it was possible all along, I just convinced myself it wasn’t. With LLMs, I can explore ideas and throw them out if they don’t work, so it allows me to be more exploratory and ambitious. (Debugging CI is certainly not what I want to spend my free time on).
I ended up with all the supported distributions tested and more scenarios, for a total of 30+ tests in the matrix. They’re also pretty fast, since they take a handful of minutes to run.
For each test, I’m orchestrating a server container and client container that follow a linear script while occasionaly waiting on each other through file presence, file contents, etc. I’m able to test all the features from the initial installation to client connection, revocation, etc. It’s not super sexy as it’s a bunch of bash assertions with grep and ping! But it’s been tremendously helpful, and allowed me to iterate much faster on the rest of the backlog in a safer way.
I still tested manually on actual cloud VMs from time to time, which still helped me catch bugs, but I can skip doing that most of the time.
Crushing the backlog: bugs and features #
Armed with my testing setup and Claude, I was able to fix a long list of long-standing issues and missing features:
- proper logging and output (colors, stdout/stderr to logs to keep it clean)
- early failures when something goes wrong to avoid leaving the system in an inconsistent state
- proper CLI interfaces instead of the messy env vars-based non-interactive mode
- support for 100% of the features in non-interactive mode
- proper IPv6 support and custom addressing for IPv4 as well
- nftables and firewalld support on top of iptables
- modern features since OpenVPN 2.4 (TLS 1.3, tls-crypt-v2, new ciphers, DCO kernel module, fingerprint instead of PKI)
- proper revocation support, client name reuse, client disconnect, client & certificate status
- automation of the easy-rsa dependency updates (and its hash). The one we were using had a CVE ☹️
- more robust Unbound setup
- more distro support: Arch Linux, openSUSE
- many small quality-of-life improvements, proper validation, etc
Claude Code was able to help me implement all of this.
For big features, drafting implementation plans (with or without plan mode, which didn’t seem to impact much) and giving access to the OpenVPN and easy-rsa source code is very helpful: it can see when a feature was added or deprecated and correct itself on usage. It can use the gh CLI to search for it, but cloning it locally works better.
It also helped me rebase PRs from external contributors that I wanted to merge. With the big refactors I did, they needed a lot of work, which would have taken a lot of time manually. The good thing is that after rebasing and fixing, I magically had tests in the PRs! It was also nice to add missing tests, docs, etc.
The GitHub pulse of ~10 days!
It took some work, but I was able to go from 150 issues and 50 pull requests to… 0.
Did I slopify my project? #
Definitely not! I’m not saying one or two bugs didn’t slip by me, but overall, the project is in a much better state now, especially considering the reworked architecture, the fixed bugs, the additional tests, etc.
It still required a lot of work to do all of that, and a vision: LLMs didn’t do it on their own. I still carefully reviewed every change until they matched what I had in mind. I just didn’t have to get my hands dirty with the bash syntax, which is why the result is that much better!
Claude Code + Opus 4.5 are clearly world-class right now, and they produce very little slop. Most of the time, it didn’t need that much correction on the implementation itself. It was really useful as a thinking partner to iterate on an idea.
Thoughts on my workflow #
Overall, it still took me a few dozen hours to get through it.
It would have taken me at least twice as much time without Claude. As a matter of fact, I probably wouldn’t have done it at all.
CLI agent vs IDE agent #
I used to be a big user of GitHub Copilot inside of VS Code: the chat panel right alongside the code is very handy, and it’s nice to specify the exact context I’m referring to.
But with CLI agents, I’m able to run many of them in a few Ghostty tabs.
With Git worktrees, I can run multiple agents on different branches in the same repo. Super handy when they don’t overlap much.
And Copilot used to be dramatically slow for file edits especially (see this blog post of mine just a few months ago). Right now it’s much better, but CLI agents are even faster.
Claude Code vs other agents #
While every lab has its own agent (even Mistral just came out with its own vibe), claude-code is probably the leader right now in terms of capabilities and features, even if open-source alternatives are catching up.
By the way, ironically enough the Copilot CLI is way behind the curve.
Opus 4.5 vs other models #
Opus 4.5 is very smart. Being able to search the web and other repos is also a big plus. Opus picks up on the design decisions of the project, and when implementing, will get it right most of the time. This + local tests + gh CLI = lots of time saved.
It looks like some other models may support web search, but it’s opt-in only for codex, for example. 🤔
To be honest, I could replace Opus 4.5 with a placeholder <current SOTA model>, because this is probably going to change in a month. But Opus 4.5 has really been a leap forward. It’s smart, gets the context quickly, and has a very good vibe. GPT-5.x models are smart but very “cold”.
Review bot #
I also liked having the Copilot review bot leave comments on PRs. It was like a second layer of security, even though the hit rate is more or less 50%. I still routinely review the comments myself and tell Claude Code to use gh to get the comments and think about it, then resolve them.
Onward #
I’m happy I took the time to take care of this project. It’s now in a much better state for the future users, and I learned a lot in the meantime.
I have so many ideas to build, but I will take the time to take care of wireguard-install as well!