Querying and Ingest issues in EU
status.honeycomb.io·3d·
📬Apache Kafka
Preview
Report Post

This incident started on December 5th, and is one of the longest in Honeycomb history, having been actively worked on and closed only on December 17th. Due to its impact and duration, we wanted to offer a partial and preliminary report to explain, at a high level, what happened.

On December 5th, at 20:23 UTC, our Kafka cluster suffered a critical loss of redundancy. Our Kafka cluster contains multiple topics, including all telemetry events submitted by Honeycomb users, the rematerialization of state changes into the activity log, and multiple metadata topics used by Kafka to manage its own workloads. This led multiple partitions leaderless and by 20:35 UTC, we were getting alerts that roughly a quarter of our usual event topic partitions were unable to accept writes.

For most Honeyc…

Similar Posts

Loading similar posts...