The first 30 seconds of a Postgres incident: why they take 30 minutes (opens in new tab)
2 a.m. PagerDuty goes off. "Production is slow." You open your laptop, fire up psql with bleary eyes, and connect to production. The prompt comes up. The cursor blinks after production=>. And your hands stop. Now, where was I supposed to look first? Do I get an overview of the whole DB, or do I start drilling into individual queries? It's been a while since the last incident, and the first move doesn't come to me. "What kind of incident is this even?" — and you freeze "Production is slow." If...
Read the original article