I had the opportunity recently to attend a DevOps talk by Jeff Horowitz of Monetate. He was followed that night by a John Allspaw from Etsy. I will write about that talk in a later post. I found Jeff’s talk very interesting. His talk was titled ‘DevOps in a Cloud of Failure’. The essence of his talk, to me, was the 12-step program of continuous deployment (a pure coincidence that the number of steps is 12, I’m sure) they use at Monetate.
But before I delve into what I learnt that night, let me introduce Monetate and their architecture to you. I had never heard of Monetate before. They are a Philadelphia based company that provides personalization services to large e-trailers. Their clients include retailers like Best Buy and Macy’s. They serve up the pages for users on their clients’ websites, based on personalized information they are able to gather. I heard them say that they will change what you see on a web page based on whether it is raining where you are browsing from (huh!?). Anyways, uninteresting for a DevOps person, except that it makes their traffic highly variable and unpredictable (The price of that iPad on Best Buy is what?!) for Monetate. They have to be ready to scale on demand and fast! That’s interesting!
They are fully virtualized on AWS. Their infrastructure is a black box to them. Their ability to scale as fast as they need to could not, in their opinion, be done without using a provider like Amazon. It does however expose them to Amazon’s quirks and failures, which they painfully shared.
Here are some of the principles that Jeff shared with us, that form the credo which led Monetate to their 12 step program.
- Fast fail – the goal is to fail fast and not spend time trying to analyze and fix a failure. When a failure happens, replace what failed with something that works. This is done at the image level.
- Deployment – the goal is to have as lightweight a process as possible to deploy. This requires heavyweight processing upfront. The goal is to always Iterate forward and not patch in place. They also have rollback mechanisms in place to to roll back to an image that works. They also make sure that deployment is fully reproducible and they can go back in time at any given time.
- Configuration management – is done via a custom in-house system, based on shell-scripts and sets of inputs. Everything is version controlled.
- Install phase – images are created using Amazon Machine Images (AMI). The steps to creating images is: (i) Install Barebones OS. (ii) Install packages and 3rd party artifacts. (iii) Install app software. They make sure the is no boot time data or dependencies. This allows them to get a 1 minute boot time and no failures due to boot time dependency failures.
- Configuration phase – they do per-instance configuration. Configuration can use boot time metadata. There are no dependencies except EC2 backplane. An application can depend on other instances.
- Runtime changes – these are very limited as the goal is to always iterate forward. All failed servers are just replaced.
The 12 step program:
This brings us finally to the 12 steps of Continuous Delivery:
- Freeze – once the developers are ready to delivery he code
- Tag – appropriate code set
- Build – tagged code
- Test – automated tests are run as a part of the build process
- Image – build the AMI image . The application is included in the image built
- Configure – the built image is then configured for the appropriate component of the application stack it holds
- Launch – the image
- Test – once the image is spun up, run more automated tests, this time on the image and the application
- Start services – on AWS. A few instances of the new stack are spun up, to begin with
- Cutover – once ready, the load balancer config is changed to cutover to the new stack
- Test – test once agin, now in production, before the new image is updated system-wide. If any test fails, roll back
- Shut down old stack – once tests pass, shut down the old stack
This process, from build to deployment is 2 to 3 hours, but may be spread over 18 hours.
Take aways :
This system works very effectively for Monetate. What I found unique was their approach to having no boot time dependencies in their images. This ensures that an image never fails at boot time due to an external or internal dependency – another sever it need data from is not responding, an internal script dies, causing the boot to fail. This makes their images extremely fast at boot time. It also on the other hand makes it such that they cannot use technologies like Chef and Puppet, that are dependent on external services. They could, Jeff pointed out, have used Chef Solo, but by that time, they had already hand crafted all they needed and it did not make sense to translate their scripts to Chef Solo.
This no-dependencies boot then gives them clean working images, that only after booting up need to be configured for the application/application component they are running. This makes test and verification of each image and what it is running very elegant.
And yes – Version everything!
- Understanding DevOps – Part 1: Defining DevOps
- Understanding DevOps – Part 2: Continuous Integration and Continuous Delivery
- Understanding DevOps – Part 3: The Battle of Dev vs Ops
- Understanding DevOps – Part 4: Continuous Testing and Continuous Monitoring
- Understanding DevOps – Part 5: Infrastructure as Code
- Adopting DevOps – Part 1: Begin with the Why
- Adopting DevOps – Part II: The Need for Organizational Change
Other DevOps Posts:
- What is Water-SCRUM-Fall?
- Leveraging DevOps in a water-SCRUM-fall world
- The State of DevOps (by PuppetLabs)
- DevOps for Mobile Apps – Slides from IBM Pulse 2013
- Chef for DevOps – An Introduction