The ‘AWS Well Architected Framework’

Amazon Web Services (AWS) in November published an updated version of their AWS Well Architected Framework paper. This paper has existed for a while; however, this version was my first read of it, and I summarize the paper here. Personally, I believe it is an extremely comprehensive paper capturing the Architectural Thinking that Amazon believes should go into architecting a system on and for AWS. For a Cloud Architect, it provides a solid read for understanding the thinking needed to architect solutions for any IaaS Cloud, not just AWS.

The paper divides architecture into five distinct areas or Pillars, namely:

  1. Security
  2. Reliability
  3. Performance efficiency
  4. Cost Optimization
  5. Operational Excellence

These five pillars allow for systems to be architected such that they have the non-functional requirements (NFRs) addressed upfront, enabling the organization to focus on functional requirements. The NFRs are delivered by the IaaS platform, and hence should not be the responsibility of the organization. This makes perfect sense, as the goal of an organization using an IaaS Cloud is to not have to worry about the NFRs and focus on the functionality and hence the business capabilities they are delivering. However, AWS in this paper also highlights the need to architect the system in a proper manner, leveraging their well architected framework to ensure that the NFRs are delivered effectively to the system hosted on AWS. In Amazon’s view, ensuring this proper architecture is the client’s responsibility, not that of AWS.

To step aside from the paper for a second – it is essential to highlight this is an education point to be made for any organization that is new to the cloud, or those who come to the cloud from a fully managed world where they rely on their Infrastructure management partner/service provider to architect, build, run and maintain their systems. In an AWS style cloud hosting model – you architect it, you build it, and eventually you run and maintain it. The cloud provider provides the cloud services that deliver the requisite NFRs. However, just like AWS highlights in the paper, these services will deliver the NFRs as desired and exptected, only if the system is architected to leverage them properly. That architectural guidance is what AWS is providing in this paper.

Let’s take a closer look at the pillars:

  1. Security –  It is listed first. One cannot have a cloud conversation with a an organization planning their migration to the cloud without security (and compliance) coming up within the first 20 minutes (trust me, I have won money taking bets on the 20-minute time-limit). AWS in the paper talks of a risk assessment driven approach to security. This is architectural thinking at its purest – looking at architecture thru a risk-management lens. They see the need to design for security, and not bring in security to secure a system post-delivery.
  2. Reliability – The paper here talks of three areas of reliability:
    • recovering from failures of infrastructure or services, and not of design to not have failures (Anti-fragile, rather than robust). It guides organizations to architect solutions with the assumption that failures will happen. One needs to make the applications high availability capable, fault-tolerant and/or self-healing. (this must have been very useful last week when AWS had a massive outage)
    • dynamic resource allocation for applications and systems. Systems of today are elastic in nature, scaling up and down with demand. They need to be architected to consume resources from the cloud dynamically too
    • disruptions due to configuration and network issues. Even is the cloud provider’s own services are up and available, issues due to configuration errors (user or 3rd party), and network provider errors, latency and outages can cause disruptions that need to be accounted for
  3. Performance efficiency – Demand changes dynamically. Technology available also changes over time. How can we design systems that can adjust their performance by efficiently utilizing resources as needed? How can we design systems that can experiment with utilizing new technologies and services (such as serverless or containers) as they become available, without having to re-build the whole system?
  4. Cost Optimization – Architecting systems to minimize cost thru their entire lifecycle by making them cost aware. Architecture needs to be designed with business goals and constraints in mind, and to ensure maximum ROI
  5. Operational Excellence – Guidance on how to most effectively run and maintain systems. How can organizations develop the capability of Continuous Improvement, driven by Continuous Monitoring? This requires not just the setting up of good monitoring, but also a continuous feedback mechanism that allows that monitoring data to be consumed by the business, developers and ops teams to continuously improve the system’s design, and its operations
Architectural decisions are driven by business needs. These require trade-offs to be made between these five pillars, to deliver on the business needs.
– AWS Well Architected Framework

For each of the five pillars the paper provides detailed content the following areas:

  • Design Principles
  • Definitions
  • Best practices, which includes key questions that need to be addressed
  • Key AWS Services
  • Resources (from the AWS website)

Of these, only the AWS Services and Resources are AWS specific. The rest of the content can be utilized as a guide for designing a ‘well architected’ system for any IaaS cloud, which delivers the required core services.

I encourage all Cloud Architects to read this paper in full, irrespective of which cloud provider they may be architecting for. If anything can be learnt from last week AWS outage is that reliance on any one cloud provider to provide an robust, ‘always available’ system is a fools errand. Systems will go down – one needs to design for ‘anti-fragility’, leveraging portability and a ‘Hybrid Cloud’ which allows one to distribute risk across not just regions within the same Cloud, but across multiple Clouds. (More on this on future posts). You can read more about antifragile systems in by book ‘The DevOps Adoption Playbook‘.

To conclude, let me quote the full ‘conclusion’ section from the paper, which best summarizes it.

The AWS Well-Architected Framework provides architectural best practices across five pillars for designing and operating reliable, secure, efficient, and cost-effective systems in the cloud. The framework provides a set of questions that allows you to review an existing or proposed architecture, and also a set of AWS best practices for each pillar. Using the framework in your architecture will help you produce stable and efficient systems, which allows you to focus on your functional requirements. 

Do post your thoughts, comments and questions in the comments section below.

Have you heard of my new book: The DevOps Adoption Playbook? It is currently ‘#1 new Release’ in its category on Amazon. 

Advertisements

One Comment Add yours

  1. mkorejo says:

    Nice write-up and I wholly agree. The concepts and principles of designing highly-available, secure systems in the cloud are agnostic to any particular provider. Every provider has specific terms, service names, etc. but expertise in one easily translates to another. Also agree that the S3 outage reinforces the need for portability across providers. No one wants to feel locked in, especially in a world with so many great options. Peace of mind associated with being able to easily move workloads between clouds … what a day that will be!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s