Why Deleting a Server Made This System More Reliable (opens in new tab)
The Simple Version When a system has too many moving parts that need to stay in sync, adding more parts often makes failures more likely, not less. Sometimes the most reliable architecture is a smaller one. The Counterintuitive Math of Reliability Reliability in distributed systems is multiplicative, not additive. If you have three servers that each run with 99% uptime, the chance that all three are simultaneously available isn’t 99%. It’s roughly 97%. Add a fourth server into a chain where a...
Read the original article