SRE

site reliability engineering, SLO, SLA, incident response

Feeds to Scour
SubscribedAll
Scoured 102 posts in 16.4 ms

Ops I did it again: The SRE Extension is out!

 🚒Incident Management  Content type: Blog
medium.com
·

Explore OpenSearch 3.7

 📊Metrics  Content type: Blog
opensearch.org·

The Death of the Four Golden Signals: Designing Telemetry for Non-Deterministic Infrastructure

 🔭Observability
devops.com·

Elastic brings AI-driven incident investigation to Kubernetes and observability tools

 🐝eBPF
helpnetsecurity.com·

Azure Availability Zone Mapping and VM Resilience Analysis Guidance using SRE.AZURE.COM Agent

 🏗️Software Architecture

shivamshashank/StackPulse: 🚀 Go-based DevOps/SRE CLI for deploying a full ☸️ Kubernetes observability stack with 📊 Prometheus, Grafana, Loki, Tempo, OpenTelemetry, Alertmanager, 🔔 Slack/PagerDuty alerts, and ⚡ k3s support.

 🔭Observability  Content type: Code
github.com··Hacker News

SRE Weekly Issue #520

 🏗️Software Architecture
sreweekly.com·

AI Agent Observability: Tracking Decisions in Multi-Agent Workflows

 🔭Observability
faun.pub
·

New comment by tenaka in "Ask HN: Who wants to be hired? (June 2026)"

 🗄️Databases  Content type: Reference

Build an agentic incident triage assistant with Amazon Quick and New Relic

 🚒Incident Management  Content type: Blog
aws.amazon.com·

Beyond Greedy Chunking: SLO-Aware Sliding-Window Scheduling for LLM Inference

 🔀Data Pipelines  Content type: Academic
arxiv.org·

ninoxAI/nightwatch: Open-source, local-first, read-only AI SRE: clusters alert storms, investigates root cause over your live systems, proposes human-gated fixes.

 📊Metrics  Content type: Code
github.com··Hacker News

New comment by pigsinzen in "Ask HN: Who wants to be hired? (June 2026)"

 💥Chaos Engineering  Content type: Discussion

Scribe Agent updates: no more manual note-taking or lost context by Débora Cambé

 🚒Incident Management  Content type: Blog
pagerduty.com·

The Question That Built Our Engineering Grading System: Would I Trust This Person On-Call?

 🚒Incident Management
fromdev.com·

How Cisco IT cut observability costs by 86% and eliminated major network outages

 🔭Observability  Content type: News
networkworld.com·

Trace n8n workflow and node executions with OpenTelemetry

 📡OpenTelemetry  Content type: Blog
blog.n8n.io·

Nurse struck off after making £19,500 by adding shifts she did not work

 🚒Incident Management  Content type: News
independent.co.uk·

How DevOps Engineers Can Use AI to Triage Production Incidents Faster

 🚒Incident Management  Content type: Blog
devopsaitoolkit.com··DEV

Faster root cause for slow traces with ClickStack Event Deltas

 📡OpenTelemetry  Content type: Blog
clickhouse.com·

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help