
A lot of the Internet runs through just a handful of pipes. There’s a huge AWS pipe, a pretty big Azure pipe, and a few more. Shut off any one of those and business worldwide suffers.
We got a big reminder of that reality recently. First Amazon Web Services experienced a global flameout on Oct. 20 that took down thousands of websites and apps. Later in the month, Microsoft had its own meltdown, with major businesses losing critical systems and a lot of customers experiencing problems with Microsoft 365, which connects much of the business world. This week, Cloudflare took down its own chunk of the Internet.
For an individual IT professional or manager who relies on AWS or Azure directly or sees other downstream SaaS services go down, these kinds of incidents result in severely hectic days, but they don’t tend to be career limiting. Going with AWS, Azure, or Google Cloud is arguably the new version of the old saying, “No one gets fired for buying IBM.”
When customers and managers see outages in every direction, there’s comfort in knowing that everyone is in the same boat that day and most everyone worldwide is waiting for the top-tier techs at AWS or Microsoft to sort out the problem. Similarly it’s hard to get too worked up at your own suppliers if you know there’s ultimately not too much they can do until Cloudflare, Amazon or Microsoft get their systems back up.
You could demand that your suppliers have multi-cloud backups. But when companies rely on full-stack clouds, using multiple AWS services in combination to deliver key applications and services of their own, there’s not much redundancy in storing data in Azure. Sure, they can recover the data from an Azure-based backup, but their cloud-based applications that rely on myriad AWS services aren’t going to be able to do anything with the data until the other AWS services are back online.
Just because you can’t do much, the outages are still an opportunity for clarification. Here are some questions to consider: 👇
This is your roadmap for the future. It gives you a good idea when you see a news alert in the future that, say, AWS is having a problem, you’ll know which of your services to look at to see if there’s going to be a downstream effect for you.
Did they let you know their status via email, social media, or a system uptime page? Or did you have to reach out to them? This invites communication with them about their response, and potentially their plans to improve that communication. 📧
Should an outage go longer than a day or involve critical data that you might need to look at in another way, what are the processes for getting your SaaS-specific data out in case you need it urgently?
Maybe you could have done more beforehand to be ready for a potential outage. Do your vendors have playbooks for their customers about what they can do to prepare for potential upstream cloud outages, or to help make critical data more accessible.
It’s useful to know how seriously your suppliers took their response to the outages, and how dedicated they are to responding better next time if possible. Ask for their post-mortem if you can’t find it. It may be as simple as a blog post or an email summary, but you want to know they’re thinking it through.