Cloud Service Reliability (Part 2): Houston, we have an… outage!

Of all the phrases from movies that have become a part of pop culture – from ‘Luke, I am your Father’, to ‘Play it again, Sam’, none have made it more to daily usability than ‘Houston, we have a problem!’ from the movie Apollo 13. Reminding my son ‘Saransh, I am their father’ does not…

Cloud Service Reliability (Part I): Apollo 13 to Google SRE

Apollo 13, in my humble opinion, is the best movie ever made on engineering and especially reliability engineering in action. It is certainly one of my personal favorite movies of all times. But how I view the movie differs by a full 180-degrees how the rest of my family views it. To them it is…

The ‘AWS Well Architected Framework’

Amazon Web Services (AWS) in November published an updated version of their AWS Well Architected Framework paper. This paper has existed for a while; however, this version was my first read of it, and I summarize the paper here. Personally, I believe it is an extremely comprehensive paper capturing the Architectural Thinking that Amazon believes…