System Reliability

Feeds to Scour
SubscribedAll
Scoured 382 posts in 55.4 ms

Ops I did it again: The SRE Extension is out!

 📋MCP  Content type: Blog
medium.com
·

Komodor Brings Autonomous AI to SRE With Reliability-First Cloud Optimization

 🌍Distributed Systems
cloudnativenow.com·

When failover isn’t safe: Building high-availability PostgreSQL on Kubernetes

 📡Replication  Content type: Blog
datadoghq.com·

Observability overload is drowning engineers

 🔧Developer tools
thenewstack.io·

Azure Availability Zone Mapping and VM Resilience Analysis Guidance using SRE.AZURE.COM Agent

 🔧Agent Tooling

The Death of the Four Golden Signals: Designing Telemetry for Non-Deterministic Infrastructure

 🤖AI
devops.com·

SQL Server Always On Availability Groups and Database Master Keys: A Hidden Failover Pitfall

 🔧Database Tuning  Content type: Blog
dbi-services.com·

Elastic brings AI-driven incident investigation to Kubernetes and observability tools

 🌍Distributed Systems
helpnetsecurity.com·

melancholictheory/wellcake: A Kubernetes operator for Valkey — Standalone / Replication / Sentinel / Cluster, operator-driven failover, proactive zero-downtime rolling restarts, Atomic Slot Migration, S3 backups, multi-region replication.

 👑Leader Election  Content type: Code
github.com··r/devops

Practice like you play: How Amazon scales resilience to new heights (ARC316)

 🔗Supply Chain Resilience  Content type: Blog
blog.domb.net·

Our DNS servers use GeoDNS to direct connections to the lowest latency servers and implement automatic failover via health checks and 5 minute expiry for the...

 🌐DNS
grapheneos.social·

How 24/7/365 SOC Improves Incident Response Times?

 🛡️DDoS Mitigation  Content type: Blog
medium.com·

SRE Weekly Issue #520

 📬Tech Newsletters
sreweekly.com·

Gauging the Spacetime Code

 ⚛️Physics  Content type: Academic
arxiv.org·

Improve your application resilience with Amazon Cognito multi-Region replication

 🛡️DDoS Mitigation  Content type: Blog
aws.amazon.com·

ninoxAI/nightwatch: Open-source, local-first, read-only AI SRE: clusters alert storms, investigates root cause over your live systems, proposes human-gated fixes.

 🤖AI  Content type: Code
github.com··Hacker News

re:Invent 2022 Building Confidence Through Chaos Engineering on AWS

 🌍Distributed Systems  Content type: Blog
blog.domb.net·

Designing for High Availability: The Operational Reference for Running a Geo-Replicated ACR

 🔄Eventual Consistency

docs: document runner failover helpers

 Durable Execution  Content type: Code
github.com
·

Agentic Observability is Not a Chatbot Over Telemetry

 🤖Agent Protocols
devops.com·

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help