SRE Weekly Issue #505 (opens in new tab)
An incident write-up from the archives, and it’s a juicy one. An update to their code caused a crash only after some time had passed, so their automated testing didn’t catch it before they deployed it worldwide.
Read the original article