AI in SRE: Where and how Google is deploying agentic AI to improve operations (opens in new tab)
Since its inception over 20 years ago, Google has used Site Reliability Engineering (SRE) to keep services like Search, Gmail, Maps, YouTube and Google Cloud reliable and highly available, adhering to the principles and practices of the reliability-first mindset. Recently though, the emergence of AI has driven multiple step-changes in system complexity. Interactions between components are now more complicated due to a variety of factors: With microservice architectures, systems are distribute...
Read the original article