PostgreSQL is highly suitable for powering critical applications in all industries. While PostgreSQL offers good performance, there are issues not too many users are aware of but which play a key role when it comes to efficiency and speed in general. Most people understand that more CPUs, better storage, more RAM and alike will speed up things. But what about something that is equally important?
We are of course talking about “latency”.
What does latency mean and why does it matter?
The time a database needs to execute a query is only a fraction of the time the application needs to actually receive an answer. The following image shows why:
PostgreSQL is highly suitable for powering critical applications in all industries. While PostgreSQL offers good performance, there are issues not too many users are aware of but which play a key role when it comes to efficiency and speed in general. Most people understand that more CPUs, better storage, more RAM and alike will speed up things. But what about something that is equally important?
We are of course talking about “latency”.
What does latency mean and why does it matter?
The time a database needs to execute a query is only a fraction of the time the application needs to actually receive an answer. The following image shows why:

When the client application sends a request, it will ask the driver to send a message over the wire to PostgreSQL (a), which then executes the query (b) and sends the result set back to the application (c). The core question now is: Are (a) and (c) relative to (b) relevant? Let us find out and see. First, we will initialize a simple test database with pgbench. For the sake of this test, a small database is totally sufficient:
cybertec$ pgbench -i blog dropping old tables... NOTICE: table "pgbench_accounts" does not exist, skipping NOTICE: table "pgbench_branches" does not exist, skipping NOTICE: table "pgbench_history" does not exist, skipping NOTICE: table "pgbench_tellers" does not exist, skipping creating tables... generating data (client-side)... vacuuming... creating primary keys... done in 0.19 s (drop tables 0.00 s, create tables 0.02 s, client-side generate 0.13 s, vacuum 0.02 s, primary keys 0.02 s). Now, let us run a first simple test. What it does is to start a single UNIX socket connection and just run for 20 seconds (a read-only test):
cybertec$ pgbench -c 1 -T 20 -S blog pgbench (17.5) starting vacuum...end. transaction type: <builtin: select only> scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 maximum number of tries: 1 duration: 20 s number of transactions actually processed: 1035095 number of failed transactions: 0 (0.000%) latency average = 0.019 ms initial connection time = 2.777 ms tps = 51751.287839 (without initial connection time) There are two important numbers here:
- average latency: 0.019 ms
- transactions per second (tps): 51751
For a single connection this is decently ok.
In the next step, we try to run the very sample query. BUT: This time we won’t use “UNIX sockets” but a TCP connection to “localhost” (so not even a remote connection):
cybertec$ pgbench -c 1 -T 20 -S blog -h localhost pgbench (17.5) starting vacuum...end. transaction type: <builtin: select only> scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 maximum number of tries: 1 duration: 20 s number of transactions actually processed: 583505 number of failed transactions: 0 (0.000%) latency average = 0.034 ms initial connection time = 3.290 ms tps = 29173.916752 (without initial connection time) The result is already drastically different. Here are the relevant numbers:
- average latency: 0.034 ms
- transactions per second (tps): 29173
Wow, throughput has dropped by a stunning 44%. The following picture illustrates this visually:

What is especially interesting to see is that the latency has just gone up from 0.019 to 0.034 milliseconds, which might look like nothing. The key here is that the query is so incredibly fast that even this tiny amount of latency makes a huge difference. The execution plan shows why this is the case here:
blog=# explain analyze SELECT * FROM pgbench_accounts WHERE aid = 434232; QUERY PLAN ------------------------------------------------------------ Index Scan using pgbench_accounts_pkey on pgbench_accounts (cost=0.29..8.31 rows=1 width=97) (actual time=0.015..0.016 rows=0 loops=1) Index Cond: (aid = 434232) Planning Time: 0.227 ms Execution Time: 0.047 ms (4 rows) The relevant number in this plan is “0.016” - this is the time the index scan actually needs to find the row inside the table. Now compare this to the additional network latency to see what such a tiny change matters.
Latency on real networks
Usually, the application and the database are not on the same machine. Before we run the test, it makes sense to take a look at the output of traceroute:
different_box$ traceroute 10.1.139.53 traceroute to 10.1.139.53 (10.1.139.53), 30 hops max, 60 byte packets 1 _gateway (10.0.0.1) 0.212 ms 0.355 ms 0.378 ms 2 cybertec (10.1.139.53) 0.630 ms 0.619 ms * You can see that the route from the machine running pgbench to the actual database box is really not long. In fact, all it is is an internal network from one machine to another - nothing more.
The following output shows what happens when we run the same test again:
different_box$ pgbench -h 10.1.139.53 -S -c 1 -T 20 blog pgbench (17.5) starting vacuum...end. transaction type: <builtin: select only> scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 maximum number of tries: 1 duration: 20 s number of transactions actually processed: 47540 number of failed transactions: 0 (0.000%) latency average = 0.420 ms initial connection time = 9.727 ms tps = 2378.123901 (without initial connection time) Here are the relevant numbers:
- average latency: 0.420 ms
- transactions per second (tps): 2378
Phew, even a latency of just 0.420 milliseconds has already collapsed our throughput from over 50.000 TPS down to 2.378. Of course, this is all going through a single connection but it is important to understand why this happens. Those 0.4 milliseconds we spend on the wire are an eternity compared to the time to just read the index (0.016 milliseconds according to the output of explain analyze).
The following image displays the throughput:

You can safely assume that those throughput numbers will be even worse, and significantly so, if you add additional network layers. Especially in cloud computing this is a major issue. Every haproxy, every network hop, every router, every firewall rule will contribute to higher latency, slowing your application down. The more overhead your network has, the less relevant execution time becomes (in case of very short queries of course).
I assume that most of my readers are now in shock and are wondering what can be done about it.
Concurrency: A way out?
The last sections have demonstrated a worst case scenario, which is most relevant if you have a single application sending data back and forth between application and database. However, on busy systems there is usually more than a single user. If we add concurrent connections we can see that we can still achieve reasonable results:
cybertec$ pgbench -c 4 -j 4 -T 20 -S blog -h localhost pgbench (17.5) starting vacuum...end. transaction type: <builtin: select only> scaling factor: 1 query mode: simple number of clients: 4 number of threads: 4 maximum number of tries: 1 duration: 20 s number of transactions actually processed: 1639827 number of failed transactions: 0 (0.000%) latency average = 0.049 ms initial connection time = 5.637 ms tps = 82007.653121 (without initial connection time) Extracting the two relevant numbers will again show us what really matters:
- average latency: 0.429 ms
- transactions per second (tps): 82007
With 4 concurrent connections we have achieved 82000 transactions per second, and a lot more is possible in case we add more and more concurrency. On a modern machine, more than 1 million operations per second are definitely possible. However, this is only doable when the database and the source of those queries are really close together and network latency is not an issue.
Can faster CPUs help?
Often people ask: What if we add more CPUs? Well … let us take a look at some ratios here:
- index lookup: 0.016 ms
- network latency: 0.490 ms
If you are using a faster CPU (single thread performance) you have just optimized those 0.016 milliseconds, which make up roughly 3% of the total. The remaining 97% of time is completely untouched. Also: This is usually not about throughput - this is about latency. Often throughput is really high, but this is not about the amount of data one can send - this is all about “how often” you can do it.
Note that, in case of queries that take a lot longer, latency is way less important, but in case of super fast and small queries, latency can add up to a reasonable disaster (especially in the cloud, which tends to have way more complex networking).
Finally …
We encourage everyone to add your own comments in the section below and share your experiences and results.
The post PostgreSQL Performance: Latency in the Cloud and On Premise appeared first on CYBERTEC PostgreSQL | Services & Support.