Bloom filters: the niche trick behind a 16× faster API
incident.io·6d·
Flag this post

This post is a deep dive into how we improved the P95 latency of an API endpoint from 5s to 0.3s using a niche little computer science trick called a bloom filter.

We’ll cover why the endpoint was slow, the options we considered to make it fast and how we decided between them, and how it all works under the hood.

Intro

A core concept of our On-call product is Alerts. An alert is a message we receive from a customer’s monitoring systems (think Alertmanager, Datadog, etc.), telling us that something about their product might be misbehaving. Our job is figure out who we should page to investigate the issue.

We store every alert we receive in a big ol’ database table. As a customer, having a complete history of every alert you’ve ever sent us is useful for spotting trends,…

Similar Posts

Loading similar posts...