AI Anomaly Detection in Grafana: 3 Mistakes We Made (opens in new tab)

Discussed on DEV

Originally published on kuryzhev.cloud We replaced 200 static Prometheus threshold alerts with an AI anomaly detection model — and spent the first month making things measurably worse before we figured out why. The model fired constantly, woke people up at 3am for non-issues, then went completely silent during a real incident. This is the honest account of what went wrong and what the working architecture actually looks like now. Context — Why We Tried AI Anomaly Detection in the First Place ...

Read the original article