You don’t need SRE. What you need is SRE.

Ever since Google published the Site Reliability Engineering (SRE) book in 2016, the SRE movement has changed how organizations look at reliability, and incident response and management. Not unlike DevOps, working on adopting SRE is resulting in an organizational cultural shift. A shift which is changing how organizations are organized, on how information flows within an organization that would allow for…

Understanding Observability

Last week my understanding of Observability went up astronomically. In fact, it was taken to an all new dimension, all by one tweet by Charity Majors (@mipsytipsy): Observability, short and sweet: – can you understand whatever internal state the system has gotten itself into? …just by inspecting and interrogating its output? …even if (especially if)…

Cloud Service Reliability (Part 2): Houston, we have an… outage!

Of all the phrases from movies that have become a part of pop culture – from ‘Luke, I am your Father’, to ‘Play it again, Sam’, none have made it more to daily usability than ‘Houston, we have a problem!’ from the movie Apollo 13. Reminding my son ‘Saransh, I am their father’ does not…