Rate limiting bursty workloads

If a system is required to handle short bursts of requests from the same users, the previous post strategy will not work. One solution in this case is what is called a Token Bucket. Each user starts with n tokens available, each requests consumes a token. Upon reaching 0 tokens, requests are denied. Tokens are replenished in a steady rate.

This post shows a implementation of rate limiting aimed at averaging 10 requests per second, with bursts of up to 10 requests (refilling at a rate of 1 token per second).

create unlogged table rate_limit(         id     int      not null,
tokens smallint not null
default 9                               check(tokens >= 0 and
tokens <= 10)...

This post shows a implementation of rate limiting aimed at averaging 10 requests per second, with bursts of up to 10 requests (refilling at a rate of 1 token per second).

create unlogged table rate_limit(         id     int      not null,
tokens smallint not null
default 9                               check(tokens >= 0 and
tokens <= 10),
primary key (id)
);

create index on rate_limit
using btree(tokens)
where tokens < 10;

1: Unlogged table for faster writes; Please do change all your tables to unlogged without understanding what you lose =);

2: Inserting a new row occurs when a request is processed, tokens defaults to 9 (allowing 10 requests in total: the current one plus 9 remaining);

3: Partial indexing: keep (indexed) only rows that should be refilled.

Some SQL-fu, run for each request:

\set id random(0, 10000)
with ins_or_upt as
(
insert into rate_limit as x (id)
values (:id)                                on conflict(id) do                  update set tokens = x.tokens - 1
where x.id = excluded.id
and x.tokens > 0
returning tokens                      )
select tokens
from ins_or_upt
union all                              select 0
limit 1

1: The above snippet is used with pgbench later, hence the use of \set command;

2: Using the variable set at line 1;

3: If a row already exists for :id, update instead;

4: Returns the number of remaining tokens, useful for the user;

5: Always return exactly one row containing the remaining tokens for :id.

We can consume tokens, but they are never refilled (yet). The system’s backend could, of course, periodically issue a statement to refill tokens one by one, but that’s not this blog’s theme. Enter pg_cron, an extension that does just that: periodically executes a statement. After installing it, we just schedule the following:

select cron.schedule(
'1 second',
$$ update rate_limit
set tokens = tokens + 1
where tokens < 10 $$
);

Done. This refills 1 token per second for rows with less than 10 tokens, capping at 10 (full bucket after 10 seconds of inactivity). All of this with no backend code.

Just for fun, I ran the insert script via pgbench using a single connection, single client, single thread, simulating 10k concurrent users.

docker exec -i pg pgbench \
--time=30 \
--client=1 \
--jobs=1 \
--protocol=prepared \
--no-vacuum \
--file=insert.sql

You’re right, that’s not a proper benchmark. Still, achieving ~20k transactions per second (tps) with my (okayish) setup and stock PostgreSQL configuration makes me wonder why some (NoSQL) database systems have such widespread adoption.

Figure 1: me changing a table to unlogged

If you are wondering about how to run this yourself, download this files:

Place everything in the same directory and run: docker compose up and then execute pgbench.

Similar Posts