In part 2 of this blog series I introduced the concept of Antifragile systems (or services). Systems that are neither fragile or robust. They are systems that thrive in chaos. That are architected, developed, deployed and run from the ground up in a manner to achieve the SLOs of availability and responsiveness expected from today’s…
Category: Cloud
Cloud Service Reliability (Part 2): Houston, we have an… outage!
Of all the phrases from movies that have become a part of pop culture – from ‘Luke, I am your Father’, to ‘Play it again, Sam’, none have made it more to daily usability than ‘Houston, we have a problem!’ from the movie Apollo 13. Reminding my son ‘Saransh, I am their father’ does not…
Cloud Service Reliability (Part I): Apollo 13 to Google SRE
Apollo 13, in my humble opinion, is the best movie ever made on engineering and especially reliability engineering in action. It is certainly one of my personal favorite movies of all times. But how I view the movie differs by a full 180-degrees how the rest of my family views it. To them it is…
The ‘AWS Well Architected Framework’
Amazon Web Services (AWS) in November published an updated version of their AWS Well Architected Framework paper. This paper has existed for a while; however, this version was my first read of it, and I summarize the paper here. Personally, I believe it is an extremely comprehensive paper capturing the Architectural Thinking that Amazon believes…
Slides: Unicorns on an Aircraft Carrier – achieving Agility in Traditional Orgs
Slides from my keynote talks at the DevOps.com hosted CDSummit events in London and Stockholm earlier this week are now posted:  
