5 min read5 days ago
–
1. The 3 A.M. Page That Should Never Exist
It’s 3:07 a.m. A service is down. The alert tells you something is wrong, not why. You Slack the one engineer who “knows this system.” They remember a similar incident from six months ago. A manual restart. A config tweak. The page clears. Everyone goes back to sleep.
By 10 a.m., nobody has written anything down. The system is still fragile. The next page is inevitable.
Now contrast that with another team: the same failure condition occurs — and nothing pages. Traffic is throttled automatically. A dependency is drained. The system degrades gracefully and recovers without human intervention.
Press enter or click to view image in full size
This isn’t about smarter engineers. It’s about arc…
5 min read5 days ago
–
1. The 3 A.M. Page That Should Never Exist
It’s 3:07 a.m. A service is down. The alert tells you something is wrong, not why. You Slack the one engineer who “knows this system.” They remember a similar incident from six months ago. A manual restart. A config tweak. The page clears. Everyone goes back to sleep.
By 10 a.m., nobody has written anything down. The system is still fragile. The next page is inevitable.
Now contrast that with another team: the same failure condition occurs — and nothing pages. Traffic is throttled automatically. A dependency is drained. The system degrades gracefully and recovers without human intervention.
Press enter or click to view image in full size
This isn’t about smarter engineers. It’s about architectural maturity. One team optimizes response. The other designs systems that don’t require it.
That gap is the real DevOps-to-SRE transition.
2. DevOps 1.0 vs SRE 2.0
DevOps as practiced in 2015 was a massive improvement over ticket-driven ops. Scripts replaced clicks. CI replaced FTP. Infra became code. That era mattered.
But most teams froze there.