Detecting API anomalies behind a 200 OK — with statistics, not AI (opens in new tab)
Most uptime monitors answer one question: is it up or down? But some of the worst incidents I've dealt with returned a perfectly happy 200 OK: an endpoint that started serving a cached error page a JSON API returning {"error": ...} with status 200 a response that quietly got 10× slower a payload that dropped from 14 KB to 800 bytes because a backend started returning empty results. A plain up/down check sails straight past all of these. I wanted my monitor to notice "it's up, but it's wrong."...
Read the original article