If your organization is standardizing on Kubernetes, this question shows up fast:
“Should PostgreSQL run on Kubernetes too?”
The worst answers are the confident ones:
- “Yes, because everything else is on Kubernetes.”
- “No, because databases are special.”
Both are lazy. The right answer depends on what you’re optimizing for: delivery velocity, platform consistency, latency predictability, operational risk, compliance constraints, and, most importantly, who is on-call when things go sideways.
I have seen PostgreSQL run very well on Kubernetes. I’ve also seen teams pay a high “complexity tax” for benefits they never actually used. This post is an attempt to give you a technical evaluation you can use to make a decision that fits your environment.
Start with the real questio…
If your organization is standardizing on Kubernetes, this question shows up fast:
“Should PostgreSQL run on Kubernetes too?”
The worst answers are the confident ones:
- “Yes, because everything else is on Kubernetes.”
- “No, because databases are special.”
Both are lazy. The right answer depends on what you’re optimizing for: delivery velocity, platform consistency, latency predictability, operational risk, compliance constraints, and, most importantly, who is on-call when things go sideways.
I have seen PostgreSQL run very well on Kubernetes. I’ve also seen teams pay a high “complexity tax” for benefits they never actually used. This post is an attempt to give you a technical evaluation you can use to make a decision that fits your environment.
Start with the real question: are you running a database, or building a database platform?
This is the cleanest framing I have found:
- Running a database: You have a small number of production clusters that are business-critical. You want predictable performance, understandable failure modes, straightforward upgrades, and clean runbooks.
- Building a database platform: You want self-service provisioning, standardized guardrails, GitOps workflows, multi-tenancy controls, and a repeatable API so teams can spin up PostgreSQL clusters without opening tickets.
Kubernetes shines in the second world. VMs shine in the first.
Yes, you can do either on either platform. But the default fit differs.
A neutral comparison model: 6 dimensions that actually matter
Here is a practical rubric you can use in architecture reviews.
If you want a quick decision shortcut:
If your main goal is self-service and standardization, Kubernetes is compelling. If your main goal is predictable performance and lower operational surface area, VMs metal are compelling.
What Kubernetes adds (and why it’s both good and risky)
Kubernetes wasn’t designed primarily for databases. It was designed for scheduling workloads, handling health checks, rolling updates, and service discovery. PostgreSQL can run well there, but you typically stack multiple control layers:
- Stateful identity and scheduling
- Persistent volumes
- CSI/storage drivers
- Operators for lifecycle management
- Sidecars for backups/metrics/log shipping
That’s not inherently bad. It’s powerful. But each layer is another thing to understand, upgrade, monitor, and debug. There is also the ‘agony of choice’ when selecting the operator for lifecycle management. There are quite a few available, and none are perfect.
The biggest Kubernetes “gotcha” for PostgreSQL isn’t that it doesn’t work. It’s that when something goes wrong, the failure analysis can shift from “what is Postgres doing?” to “which Kubernetes subsystem is influencing Postgres right now?”
A very common pattern: a performance incident that starts as “write latency spiked” turns out to be tied to eviction behavior, scheduling pressure, or storage-layer hiccups. Those are solvable problems, but only if you already have deep Kubernetes operational maturity.
What VMs give you (and what they don’t)
VMs are boring in the best way: fewer abstraction layers between PostgreSQL and the hardware.
That usually means:
- More predictable latency (especially disk + network)
- Easier kernel-level tuning (huge pages, I/O scheduler, NUMA considerations)
- Simpler operational failure analysis (“the host is slow” is a real thing you can measure and act on)
- More straightforward incident response for teams that already have VM/host tooling
But VM isn’t “free” either. The cost shows up in different places:
- Slower provisioning and less self-service
- More configuration drift risk (“snowflake servers”)
- More manual day-2 operations unless you build good automation
- Higher discipline required for patching, backups, and failover testing
The platform might be simpler; the process still needs maturity.
The performance reality: storage and network decide more than “K8s vs VM”
Most “Postgres on Kubernetes is slow” stories are really one of these:
- The storage class wasn’t suited for database workloads.
- CPU throttling or noisy neighbor effects were introduced through cgroups / limits / oversubscription.
- Network paths became less predictable (overlay, MTU issues, cross-zone routing).
- Failover / restart behavior wasn’t tested under real load.
Storage: the durability and jitter problem
PostgreSQL is very sensitive to storage behavior because it relies heavily on fsync semantics, WAL throughput, and predictable latency for sync writes. On bare metal or a well-provisioned VM, you can often get very stable performance by:
- Using fast SSD/NVMe
- Separating WAL and data volumes when appropriate
- Benchmarking with fio and Postgres tools (pg_test_fsync) before you commit to architecture
On Kubernetes, you can do this too, but you must be intentional:
- Prefer storage classes built for sustained IOPS and latency stability (not just “it supports PVCs”)
- Validate snapshot/restore behavior end-to-end (because snapshots that exist but can’t restore correctly are theatre)
- Consider dedicated node pools and careful volume placement if you’re chasing low jitter
Network: the “multi-region makes everything harder” lesson
Replication lag is a good example of why network matters more than platform ideology. In one benchmark study1 (single-region vs multi-region), average replication lag in single-region was around a few milliseconds, while multi-region averaged tens of milliseconds with occasional spikes under load. The big takeaway: geography and network dominate lag behavior far more than whether you run inside a pod or on a VM.
So if your decision is driven by “we want multi-region active-active,” focus on replication architecture and network reality first. Kubernetes won’t save you from physics.
Reliability and HA: Kubernetes gives you rescheduling, not correctness
A controversial statement that’s still true:
Kubernetes gives you rescheduling. PostgreSQL needs correctness.
If a Postgres pod dies, Kubernetes will restart it. Great. But high availability for PostgreSQL is about:
- avoiding split brain
- promoting the right node at the right time
- fencing the old primary
- ensuring replicas are consistent
- ensuring client traffic shifts cleanly
- ensuring backups and restore paths are proven
Kubernetes can help you automate that with mature operators. VMs can help you automate it with mature HA tooling (Patroni/repmgr + a DCS + load balancers, etc.). In both cases, correctness comes from your HA design, your fencing strategy, and your tests, not from the platform’s marketing.
When Kubernetes is a strong fit for PostgreSQL
Kubernetes becomes a very rational choice when:
1. You already run a mature Kubernetes platform
- You have stable storage classes
- You have strong observability
- You have SREs who understand scheduling, disruption, and capacity planning
2. You want an internal “Postgres-as-a-service” model
- Developers request databases via a ticket/API and get guardrails by default
- Standardized backups, monitoring, parameter baselines, and security policies
3. You need many isolated Postgres clusters
- Multi-tenant environments where per-tenant isolation is valuable
- Frequent creation/destruction of clusters (CI, preview environments, ephemeral staging)
4. Your org operates with GitOps discipline
- Declarative config changes
- Reviewable diffs
- Automated drift detection
In these cases, the platform benefits can outweigh the complexity, because you’re actually using the platform benefits.
When VMs are a stronger fit
VMs tend to be the better choice when:
1. Your Postgres cluster is “crown jewel” infrastructure
- Latency-sensitive OLTP
- Predictable I/O behavior matters more than provisioning speed
2. You don’t have Kubernetes specialists on-call
- The fastest path to reliability is fewer moving parts, not more automation
3. You’re running a small number of large databases
- Dedicated instances, tuned for workload
- Scaling is mostly vertical and carefully planned
4. You need tight control over kernel + host settings
- NUMA behavior, huge pages, I/O scheduling, direct-attached NVMe, etc.
If you’re in this world, “boring infrastructure” is a feature.
Two reference architectures you can copy
Option A: Kubernetes with an operator (platform-oriented)
Key design choices:
- Use a mature Postgres operator for day-2 operations (backups, failover, upgrades)
- Use dedicated node pools for Postgres
- Use pod anti-affinity so replicas land on different nodes
- Use PodDisruptionBudgets so maintenance doesn’t take you down
- Keep backups off-cluster (object storage) and run restore drills
And your operator-managed cluster spec should include:
- explicit resource requests
- storage class selection
- monitoring enablement
- backup configuration
- replication settings
Option B: VMs with Patroni (database-runbook oriented)
Key design choices:
- 3-node cluster (1 primary, 2 replicas)
- Patroni for HA with a DCS (etcd/Consul)
- HAProxy for routing writes to primary and reads to replicas (optional)
- PgBouncer for connection pooling
- pgBackRest (or similar) for backups and PITR
- Monitoring stack: node metrics + Postgres metrics + log analysis
This model is widely understood, auditable, and tends to fail in more predictable ways.
Common gotchas (the ones that create 2am incidents)
Kubernetes gotchas
1. CPU limits causing throttling
You can meet “CPU request” but still get throttled under burst if limits are too tight.
2. Pod evictions during load
Especially if PDBs, priorities, and eviction policies aren’t designed for stateful workloads.
3. Storage that looks fast on paper but has latency spikes
Sustained performance is what matters, not peak IOPS marketing.
4. Backups that exist but restores that fail
Test restores on a schedule as a drill, not during an incident.
5. Operator upgrades as a hidden dependency
Your database lifecycle now depends on the operator lifecycle.
VM gotchas
1. Unvalidated failover
You “have HA” but haven’t practiced it under load with real application behavior.
2. Backup confidence without restore drills
The only backup that matters is the one you restored successfully.
3. Configuration drift
Two replicas that aren’t actually identical are a slow-motion outage.
4. Noisy neighbor on shared hypervisors
“It’s on a VM” doesn’t mean you own the underlying contention story.
5. OS patching and reboots without a runbook
Routine maintenance becomes risky without clear procedures.
The punchline: choose the platform that matches your org’s operating model
My take is simple:
- Kubernetes is excellent when you’re building a database platform.
- VMs are excellent when you’re running a database.
Both can be production-grade. Both can be disasters. The difference is whether your organization is set up to operate the platform you choose.
If you want one practical recommendation that avoids regret, this is it:
Run dev/test Postgres on Kubernetes if it helps delivery speed. Run production Postgres where you can guarantee predictable storage, clear failure modes, and strong operational ownership. That might be Kubernetes, or it might not.
Related
[1] Benchmark Study on Replication Lag in PostgreSQL using Single Region and Multi-Region Architectures
[2] Is Your PostgreSQL Deployment Production Grade?
[4] Clustering in PostgreSQL: Because One Database Server is Never Enough (and neither is two)
[5] Database in Kubernetes: Is that a good idea?
[6] Databases on K8s — Really? [Part 1] [Part 2] [Part 3] [Part 4]