On self-managing Postgres

22 Dec, 2025 *

Pierce Freeman wrote a blog post, Go ahead, self-host Postgres, which sparked lively discussions on Hacker News and Lobste.rs.

Pierce says:

I’d argue self-hosting is the right choice for basically everyone, with the few exceptions at both ends of the extreme.

I have a different opinion based on my experience as a manager of engineering teams across multiple companies. My teams have owned lots of databases, including lots of Postgres. As the manager, I was responsible for staffing the team with the right people and skillsets to keep everything humming along, which has shaped my perspective.

In my view…

22 Dec, 2025 *

Pierce Freeman wrote a blog post, Go ahead, self-host Postgres, which sparked lively discussions on Hacker News and Lobste.rs.

Pierce says:

I’d argue self-hosting is the right choice for basically everyone, with the few exceptions at both ends of the extreme.

In my view, it’s not about the burden on you to get something like Postgres running, keep it reasonably maintained, automate deployment and maintenance, get minimum-viable backups working, etc. It’s not about you, the individual. Rather, it’s about the enduring team, business continuity, and the cost of downtime.

Sure, you, the individual, got everything running perfectly, and all your cool scripts mean that you, the individual, spend just minutes per month on maintenance. But what happens when you’re not around? Suppose you get hit by a bus, leave the company, go camping off the grid, or just forget to bring your work laptop out to date night when you’re on call.

At any sizable business that lasts long enough, there will come a day when someone else needs to figure out your shit, perhaps under duress during a production outage, with angry customers piling up and senior leadership breathing down necks. What happens then?

Is your system well documented?
How confident are you in your monitoring system?
Is alerting set up? Who do the alerts route to?
Did you build a solid backup and restore procedure? When was the last time you tested it?
When was the last time your peers tested it? Without your help? Could they run it in an emergency?
Have you tested it on every instance/cluster? (Self-managers tend to have more pets than cattle, in my experience.)
Have you thought through whether your RTO and RPO numbers are sufficient?
Do you have redundancy across regions, and do you have a hardened failover procedure?
When was the last time your peers tested that?
What if a procedure fails halfway through? How many teammates are competent enough to dig into the internals of a live system?

The list goes on and on, and it scales in length with the complexity and criticality of your rig.

Do you have read replicas? More docs, more playbooks, more testing.

Some sort of HA? More.

Are you running PgBouncer? More.

Are you managing clusters of instances and replication? More.

Do you have multiple data centers for redundancy? More, my friend! More!

Next topic: When was the last time you tried to hire someone with strong expertise in low-level database administration? For me, it wasn’t very long ago. My company was well known, had a good culture, paid above market rate ($150-300K depending on seniority and location), and it was still really fucking hard to find good people. It was similarly difficult to train up junior members of the team due to the magnitude of complexity and cost of mistakes. That scared me perhaps more than anything else.

Folks are quick to point that that managed services don’t magically solve all these problems. There’s still downtime and complexity, and you still need to hire engineers to manage stuff. Sure, yep, of course. But managed services do solve or at least mitigate a lot of problems:

Backup and restore procedures are hardened and documented.
Replication is hardened and documented.
You get automated zonal failover for high availability out of the box.
Regional failover procedures are probably at least documented.
You get logging, monitoring, and alerting out of the box.
Scaling becomes a mostly solved problem. Capacity is almost always available and procedures should be easy to grok.
For example, getting more disk space when you need it is just a checkbox (or boolean flag; in Cloud SQL, anyway). The best and brightest on my last team spent several years trying to solve that problem on-prem.
Your cloud provider will probably force you to keep your instances on a relatively modern version of Postgres through maintenance windows and extra cost for ancient versions. That’s a blessing disguised as inconvenience. In my experience with self-managed systems, these upgrade tasks rarely get prioritized until it’s a quasi-emergency. And because they’re rarely prioritized, most people aren’t confident in executing them safely.
Hiring people who can figure out Cloud SQL or RDS is a helluva lot easier than hiring people who can figure out the DIY Postgres cluster that ex-employee Joe duct-taped together 8 years ago using Python 2.6 scripts that he ran from his laptop.
You have an around-the-clock professional support team. Sure, those teams are imperfect. Sometimes they outright suck. But they’re something. And behind those support teams are probably sufficiently staffed, quite talented engineering teams that often help with escalations.

This debate will go on endlessly. I’m biased, clearly, and I don’t claim to be correct in any absolute sense. There are far too many factors and nuances at play. But I do know that, for my day job, I want to stay well away from infrastructure teams who are self-managing this stuff. Been there, done that, have the scars, don’t want any more. 😅

Similar Posts