We’re rounding out our WAF blog series with the final pillar of the AWS Well-Architected Framework: Operational Excellence. Each pillar of the Well-Architected Framework includes a set of design principles that define that pillar’s focus. While other pillars benefit both the business and technical sides, when it comes to Operational Excellence, AWS focuses solely on business value.
Let’s check back with our favorite retail chain, Echidna Electronics. Echidna decided it is time to update their customer recommendation software, Project Kookaburra. Talented analytic techs created some great Machine Learning algorithms and statistical models they think increase sales 3-5x.
To test these new algos and models, they jump to the console, stand up a few EC2 instances, copy the code from their laptops onto the EC2s, and start testing. After a week or so, everything is performing better than expected. Their next step is to copy the code to production—set and forget—and the team is done. Fantastic workflow and process!
Hopefully, it’s obvious the issues with the above process are all the manual steps it requires: the copying of code, the manual deployment of infrastructure, etc. To be successful in the cloud game, automation must be leveraged. Defining your infrastructure, configuration, and application as code are key to success. Minimizing human interaction in deployments only improves consistency. Take advantage of tools like CodeBuild, CodeDeploy, and CodePipeline to implement code testing. Or use CloudFormation to maintain consistency in your infrastructure deployment.
The Echidna techs ran through the runbook for deploying their instances, but it took them an hour longer than usual to do so, because they overlooked a few steps and needed to backtrack. This falls in line with the first principle: once your deployments are running as code, the resulting output provides you with exact documentation of your environments. For every build that’s run, an updated result is available for review.
While environment documentation is an administrator’s favorite job (or maybe a close second), this kind of process simplifies it exponentially. It won’t draw Visios or write the Word documents, but it will provide a point of reference that’s true, and (hopefully) versioned.
Echidna’s analytics techs learned a thing or two from other company departments. Last year, their website had an outage based on a bad update that wasn’t properly tested, resulting in a large loss of revenue.
From that point on, any updates would be small and incremental, allowing for quick reversal upon failure. The ability to pin-point issues from the deployments restores some sanity to the on-call personnel.
Note that when it came to the deployment of their code, Echidna used the good-ol’ ‘set and forget’ method. We’ve all been there, done that. Not many people want to break something that’s already working “just fine.”
Finding the time to review current workloads to identify possible enhancements or even cost-cutting approaches often proves advantageous. This not only gives the team a refresher on the workload configuration but also helps validate current procedures.
During the website outage mentioned above, the CTO stormed into the IT Director’s office:
CTO: “The site is down. We’re losing business fast. How quickly can we get the site back up?”
IT Director: “We take backups every 5 minutes.”
CTO: “Good. Restore the latest.”
IT Director: “OK. To further set expectations, know this has never been properly tested.”
This conversation is like nails on a chalkboard. It’s one thing to have a procedure in place for failure; it’s another to not test it frequently. This applies not only to Disaster Recovery scenarios but also to current workloads as well. Are there any single points of failure? Do you know how each subsystem affects another? Have you analyzed the risk and mitigated to the best of your abilities? Failures are inevitable. So, have a procedure in place and be sure to test it frequently. You’ll all sleep better at night as a result.
While this is somewhat self-explanatory; it can be a hard process to follow. You’re in a post-mortem, discussing the findings. Take this opportunity to put the proper policies, procedures, and contingencies in place to prevent the same mistake from happening twice. Take a step back and be sure you mitigated any risks as effectively as possible.
This brings us to some additional areas of focus, which can also be found in the other pillars:
As you review your workloads; take the time to understand why the system is set up as is. Ask questions to ensure business requirements are covered. Look for areas of improvement, not only in the technical details, but also in procedures and operations.
To learn more about the Operational Excellence Pillar of the AWS Well Architected Framework, check out the official AWS white paper: Operational Excellence Pillar. And If you’d like to see how your application stacks up, please feel free to schedule your FREE Well-Architected Review with a Certified AWS Solutions Architect from Anexinet.
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
Cookie | Duration | Description |
---|---|---|
cookielawinfo-checbox-analytics | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics". |
cookielawinfo-checbox-functional | 11 months | The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional". |
cookielawinfo-checbox-others | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other. |
cookielawinfo-checkbox-necessary | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary". |
cookielawinfo-checkbox-performance | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance". |
viewed_cookie_policy | 11 months | The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data. |
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.