Investigation Reports: When Monitors Get Smarter (opens in new tab)
Authored by Marco Aquilanti When a monitor fires, there's a familiar sequence of checks required to find the root cause. The engineers who set up the monitor usually know these steps by heart — they know the dependencies, the error codes, what to check and where. But for the on-call responder, these steps aren't always obvious. Historically, the solution was to force engineering teams to document the checks in a playbook and hope the responder would read it under pressure. Today, we can offlo...
Read the original article