HTTP 200 Is a Lie: A 30-Line Schema Canary for Source Drift (opens in new tab)
A scraper that returns HTTP 200 is not a scraper that returns good data. Those are two different claims, and almost every monitoring setup I've seen conflates them. Here's the failure mode nobody writes code for. The source you scrape quietly changes. A field gets renamed, a number comes back as a string, one column goes blank. Your request still gets a 200. Your parser doesn't throw. Your job exits green. And from that day forward, every scheduled run feeds slightly-wrong records into your c...
Read the original article