Find out why it’s important for businesses to apply these attributes of a highly available system architecture to mitigate the risk of downtime and disruption.
On June 8th, I experienced intermittent connectivity to several websites like Amazon, Twitter, and Hulu. Frustrated, I tried multiple attempts at refreshing my browser, but nothing improved. This outage was due to a failure at Fastly, which supports many of the web’s most popular sites (for more detail, see the report posted by Fastly). This incident highlighted the fragility of modern digital infrastructures, at any degree of complexity or style. Specifically, it re-emphasized the importance of a digital architecture with high availability to avoid costly downtime.
How Can Your Business Mitigate the Risk Of Downtime?
It’s important that your business identifies and implements proven principles of establishing a high-availability architecture. For example, if your business manages a private cloud, you’ll need to ensure that your resources possess the following four proven attributes of a highly available system to mitigate downtime:
- Scalable — A highly available system must come with the ability to scale vertically by adding memory and storage resources to a virtual machine, or to scale horizontally by adding instances of resources to your configuration.
- Agile — A highly available system must be able to deploy resources quickly to meet changing conditions and requirements. The ability to automatically scale “elastic” resources is even better. Elastic resources are on-demand resources which can be dynamically scaled to meet changing workloads.
- Geo-Distributed — A highly available system must give you the ability to distribute computational clusters across regions and countries. This supports the ability to distribute workloads in a balanced way and provides opportunities for data redundancies to protect your information in case of failure. AWS, Azure, and other modern cloud providers can deploy your apps to regional datacenters around the world.
- Power, Network & Computing Resource Redundancies — A highly available system must have a back-up plan for an unexpected event so that it can recover quickly and resume normal operations. This means the system must have data redundancies to leverage when needed.
$30-$45MM
Amazon’s estimated loss in sales in one hour of downtime
(Source: Reboot: An Ecommerce SEO Agency and David Jinks: ParcelHero’s Head of Consumer Research)
In many cases, your business might already use a popular platform like Salesforce or Azure for your cloud computing services. It’s up to you to decide your desired service level to codify in a service-level agreement (SLA). An SLA is a performance commitment that usually focuses on uptime or the percentage of time that a service is operational. As the table below shows, an SLA percentage of 99.999 offers only 5.26 minutes of expected downtime per year. Ultimately, it’s up to the business to determine the value of its time and then decide what SLA might be most cost-effective.
SLA percentage | Downtime per week | Downtime per month | Downtime per year |
99 | 1.68 hours | 7.2 hours | 3.65 days |
99.9 | 10.1 minutes | 43.2 minutes | 8.76 hours |
99.95 | 5 minutes | 21.6 minutes | 4.38 hours |
99.99 | 1.01 minutes | 4.32 minutes | 52.56 minutes |
99.999 | 6 seconds | 25.9 seconds | 5.26 minutes |
It’s important to be aware that SLAs can quickly become expensive. Therefore, your business should first decide how much downtime it can afford. Then, you can determine which SLA and supporting architecture your business will need.
Summary
These proven attributes of a highly available system architecture will mitigate the risks of a potential disruption to your business. The Fastly incident proved that even a short disruption to your business’ mission-critical systems can have meaningful financial and operational consequences. Therefore, take the time to evaluate the most cost-effective way to mitigate downtime for your business, whether that’s re-negotiating your SLA or developing your own strategy using these proven principles to maximize the availability of your system. It’ll be well worth your time.
Chandler Terrell is a Consultant in Opportune LLP’s Process & Technology practice. His career at Opportune began when he helped to build and maintain a custom inventory management model, which would become the technical design for a custom system development. A certified Azure Data Scientist Associate and a Salesforce Certified Administrator, Chandler graduated from Baylor University with a B.B.A. degree in Management Information Systems and earned a Graduate Certificate in Business Analytics.
Oil and gas operations are commonly found in remote locations far from company headquarters. Now, it's possible to monitor pump operations, collate and analyze seismic data, and track employees around the world from almost anywhere. Whether employees are in the office or in the field, the internet and related applications enable a greater multidirectional flow of information – and control – than ever before.