The Well-Architected Framework is a set of core pillars and design principles that help organizations build and operate cloud workloads effectively. As AWS explains, it is essentially a framework for designing and running workloads in the cloud, whether that workload is something simple like a to-do list app or something as complex as a large e-commerce platform.
Its purpose is to help companies apply cloud best practices across six key pillars: Operational Excellence, Security, Reliability, Performance Efficiency, Cost Optimization, and Sustainability. Each pillar offers guidance, recommended practices, and key questions that help assess how well an architecture is designed and how closely it aligns with proven cloud principles.
AWS outage
revealed an important pattern while many customers were affected, others continued
operating without disruption
Best Practice Discovery
AWS solution architects studied those differences and found that the unaffected
customers were following certain architectural practices that made their environments
more resilient.
Official Framework Launch
AWS turned those observations into the first official version of the Well-Architected
Framework
Cost Optimization Added
At that stage, it focused on four pillars: Operational Excellence, Security,
Reliability, and Performance Efficiency. The framework later expanded with the addition
of Cost Optimization
Self-Service Tooling
AWS also introduced the Well-Architected Tool to make architecture reviews easier and
more structured.
Sustainability Pillar
A new focus on environmental impact and efficient resource usage.
The first pillar is centered on operating cloud workloads effectively and efficiently. As AWS explains, it is about being able to run, manage, and monitor systems in a way that delivers business value. One key design principle within this pillar is anticipating failures before they happen. In other words, no team should release code into production and simply hope for the best. Proactive monitoring and preparation are essential to achieving operational excellence.
This pillar is focused on protecting systems and data. One of its core principles, as AWS puts it, is to “keep people away from data.” While that may sound unusual at first, the idea is to reduce reliance on manual, error-prone processes. Instead, organizations should use automation, controlled access, and well-defined policies so that data is managed securely through systems rather than through direct human handling.
The reliability pillar is about designing applications that can consistently perform their intended functions and recover quickly when failures occur or conditions change. AWS highlights that reliability means not only meeting expectations under normal circumstances, but also adapting to shifting demands. A major best practice in this area is building systems that can recover automatically from failure, minimizing disruption to the business.
This pillar focuses on using cloud resources in a way that delivers strong performance. One of its main design principles is giving teams access to advanced technologies without requiring them to manage unnecessary infrastructure complexity. AWS emphasizes that organizations should take advantage of managed services whenever possible, allowing them to focus on higher-value work instead of spending time maintaining lower-level components.
Because cloud pricing is dynamic and usage-based, cost optimization requires ongoing attention. Organizations need to monitor spending continuously and ensure that resources are sized appropriately for actual business needs. The Well-Architected Framework also recognizes the importance of cloud financial management as a dedicated capability - one that combines technical understanding, business priorities, and financial oversight.
The newest pillar, sustainability, is concerned with reducing the environmental impact of cloud usage. It encourages companies to design solutions that minimize waste throughout the lifecycle of their systems. AWS gives the example of avoiding software updates that force customers to replace devices that are still fully functional. The broader goal is to build architectures that are efficient not only from a technical and financial standpoint, but also from an environmental one.
To conduct a Well-Architected review effectively, AWS emphasizes that it should not be treated like an audit. It is not about answering a checklist with simple yes-or-no responses. Instead, the goal is to have an open, honest, and constructive discussion about the architecture, its risks, and where it can be improved.
The review team should ideally include at least one technical lead along with someone who brings a business perspective. Well-Architected consultants, such as those from Levi9, can also help guide the conversation, frame the right questions, and recommend best practices. The objective is not to “pass” the review, but to identify gaps, risks, and improvement opportunities that can then be documented, prioritized, and addressed.
Additional resources for Well-Architected Frameworkyou can use the Well-Architected Framework tool both to review your cloud applications and as a “reality check” on how robust your cloud architecture is. Here is a brief description of what you can expect during a Well-Architected review.
By implementing the Well-Architected Framework early in the process, companies can feel confident that their cloud-based applications meet the highest standards for security, reliability and operational excellence.
Go to AWS Well-Architected Framework and define a workload by giving it a name and description. For example, Work From Office Application.
Select attributes like owners, regions, accounts, etc.
Within the Well-Architected Framework tool, activate the AWS Trusted Advisor. This will integrate recommendations from Trusted Advisor into some of the Well-Architected questions.
The “Well-Architected Framework” lens is selected by default, providing the core Well-Architected questions.
For this demo, we select another lens, such as “Serverless,” so we can see how serverless-specific questions apply here.
Each of the six pillars, like security, reliability, etc., has a set of questions you must answer. For example, one question in the Cost Optimization pillar asks, “How do you decommission resources?”. On questions like this, answers might be dependent on each other. For example, if you answer that you Implement a decommissioning process, you also need to have Track resources over their life time selected. You can also choose none of the options and skip most of the questions, in which case AWS warns that your scores will be very low.
Some questions integrate with Trusted Advisor. This integration checks your answers against actual configurations.
As an example, let’s take a look at the 5th question in the Security pillar. The question here is, “How do you protect your network resources?” If you chose Control traffic at all layers, you have the option to activate the Trusted Advisor integration and check the answer against reality. The tool might find some vulnerabilities that you were not aware of and point out that you have certain security groups for which traffic is not controlled. The integration alerts you to where you are missing insight about your cloud workload.
If we apply a custom lens, such as Serverless, we also get a series of questions focused on serverless apps. One such example would be “How do you build resiliency into your serverless application?” - a question that only makes sense in this particular case.
Upon completion, the tool provides a visualization of the medium and high risks for each pillar. In Caelum’s experience, the reliability pillar is the most riddled with risks. However, the tools also provide recommendations and insights to improve the workload’s alignment with best practices based on your answers.