When You Can’t Access the Cloud

Jane-Michele Clark
Director of Business Strategy
Cloud
clock

Estimated reading time 

7

min

calendar icon

April 15, 2021

July 14, 2025

Table of content

We all welcome a bright sunny day when there’s not a cloud in the sky. We love those days, but when the equivalent happens with cyber clouds, it’s a different story.

Users of Microsoft’s Windows 10 operating system learned this in mid-March. Two weeks later, on April 1st, 2021, Microsoft Corp. experienced a massive cloud outage that took most of its Internet services off-line. During the outage, no one could access Microsoft’s Azure cloud services including  Bing, Office 365, OneDrive, Skype Live, Teams and Xbox. This was no April Fool’s joke, and the impact was felt around the globe.

Last year, the almost overnight shift to “work from home” resulted in a surge in remote traffic, and stressed all the mega cloud computing platforms.

Not surprisingly, the major cloud providers all experienced major cloud outages last year, which had a cascading effect on web applications and services, impacting businesses in myriad ways.  We’ll get that to in a moment, but first…

What is a Cloud Outage?

It’s simply the term given to the period of time when the cloud infrastructure (including computing and networking capabilities, data storage and the interface for users to access virtualized resources) cannot be accessed or is not available for use. Some instances, it can also refer to lack of performance as per the agreed-upon SLA metrics, creating downtime for the user.

In last week’s blog post, we discussed how to protect yourself against power surges and grid failures, coincidentally, power failures are the biggest cause of cloud outages. Indeed, some years back, a Gartner report suggested that power outages represented a larger threat to cloud usage than potential security breaches.

Leading Causes of Cloud Outages

  • Power Failure: This one needs no further explanation, though you may wish to ask your cloud providers what failsafes they have in place in terms of UPS, generators and geographic redundancies.
  • Cybersecurity Breaches: Despite every organisation’s best efforts, bad actors can sometimes worm their way into systems. Attacks such as Distributed Denial of Service (DDoS) can prevent users from accessing cloud services. Other types of malware and ransomware can cripple the cloud altogether.
  • Hardware and Software Problems: Like any other network, cloud infrastructure comprises multiple hardware and software technologies. As such, cloud services are prone to the same issues that can cause problems with your network. The difference is the number of redundant systems, checks and balances, and personnel dedicated to keeping the cloud up and running.
  • Networking and Collaboration Challenges: Cloud providers rely on telecommunications providers to deliver their services. They also have to contend with government policies in different parts of the world. When communication falls down, there can be problems. It’s good to know, however, that there has been far more collaboration in the past two years ,so that load-balancing can be resolved between multiple players in countries, and other issues can be addressed as well.
  • Human Error: Despite the strong checks and balances mentioned above, it could, potentially, take one person to make one mistake and… Luckily, protocols are becoming tighter and tighter to ensure this does not happen. Indeed, none of the recent cloud outages were the result of someone’s inattentiveness.

Major Cloud Outages of 2020

  • March 3, 2020 – Microsoft Azure: In this instance, a mechanical cooling system failure led to the outage that affected customers served by Microsoft’s East US Data centre.
  • June 10, 2020 – IBM Cloud: As a result of a third party network provider overloading the IBM cloud network with incorrect routing, customers in Washington, DC, Dallas, London, Frankfurt and Sydney could not access their regular cloud services, including Kubernetes, App connect and Watson Ai for nearly 4 hours.
  • November 25, 2020 – AWS: Customers served by the North Virginia location were impacted by a global outage that began at 8:15 ET and lasting 11 hours. Users, including 1Password, Adobe Spark, Flickr, Glassdoor and The Washington Post, lost a full business day and more; services such as Lambda, Managed Blockchain, Marketplace, MediaLive, Workspaces several others were also affected. The cause: Problems related to Amazon Kinesis, which enables real-time processing of streaming data
  • December 14, 2020 – Google Cloud: For nearly an hour, services such as YouTube, Google workspace and Gmail experienced interruptions as a result of an outage related problems with the automated storage quota management system. The system’s authentication capacity was reduced and users around the world could not access services.

Each of these four leading companies experienced what users today consider significant downtime, each for a different reason. Each of these companies has also experienced cloud downtime in 2021.

The point? Given that cloud usage is expected to grow 50% from 2020 – 2024, there are actually two:

  1. IT professionals must include ways to protect their organisations in the case of cloud outages; this should become part of your Standard Operating Procedure Playbook.
  2. Corporations should consider private cloud permission-critical applications.

Cloud Choice Considerations

  • It’s important to know what your SLA requirements are for various types of workloads. For mission-critical IT workloads, public cloud may not be the right option.
  • You should also consider any regulatory compliance standards by which your industry is governed before making final decisions.
  • Explore a multi-cloud approach as part of your IT infrastructure strategy. That way, if one of your cloud providers’ data centres goes down, you have failover redundancies to ensure business continuity. Also, if you go with multiple vendors, you won’t get locked in and can leverage reduce pricing and market initiatives to your advantage.

Some Tips if Using Public Cloud

The use of “tips” is deliberate; we are still in the early days of how to manage cloud failure and no definitive protocols have yet to be established.

  • Identify your SLA requirements for your different workloads and know how your public cloud provider stacks up. A hybrid version may better suit your needs, if the budget is not there for other options.
  • Look at how you are using the cloud today. Assess which databases are essential for your operation and consider having redundant on-premises storage for these databases, with an alternate way for key personnel only to access them.
  • Have an alternate way for employees to communicate. Zoom and Skype are two options that allow people to have free accounts. Zoom also went down for four hours on August 24, 2020, affecting millions around the world.
    That being said, get employees to install these ahead of time. Concurrently, you can create a private landing page does not require login credentials; use this to communicate updates to employees. Obviously, you will need to train your employees ahead of time on the procedures to follow, and will need to test their ability to communicate using another platform.
  • Ensure you have an alternate means of communicating not only with your employees, but with customers and other stakeholders so you can let them know the source of your business interruption, and what you are doing to serve them in the interim.
  • Microsoft has had multiple outage problems that have affected users’ ability to authenticate and login, but they are already in, service continues to work. So… get your employees to log in to Teams and Microsoft 365 when they start their workday, and stay logged in until quitting time.

Although public cloud can be an excellent option in many instances, for large enterprise-level organisations, there may be better options.

As our name suggests, we are cloud experts. If you’d like more information on how to better protect your cloud, and how to ensure you minimize your risk while leveraging cloud benefits, please feel free to contact us at [email protected] or (416) 429-0796 or 1.877.238.9944 (Toll Free), even if you’re only looking for a knowledgeable shoulder on which to bounce some ideas.

Download PDF

Back to insights, resources and news
arrow

Similar insights