Failure in the cloud is something that opponents of the technology have been crowing about for years – citing things from security failures to full-on outages that will cause the downfall of the industry as a whole. That hasn’t happened, but a recent outage of Amazon’s servers has demonstrated that nothing is perfect, and that some of the best providers in the business can go under – even when there should be no way that they can.
A collapse like this is a rarity for Amazon, which has a track record of high uptime and excellent overall reliability. Now, their servers are back up and running (for the most part), and both customers and other providers are wondering just what happened. In spite of multiple redundancies built in to the system, it appears that a single point of failure caused the crash, something that wasn’t predicted for and more than likely could not have been accounted for. It appears that no personal or sensitive data was lost, but there are several points that companies can learn from this experience.
First, a company must always understand that their provider is not perfect – in the cloud, elasticity is part of the nature of the equation, and this constant flux means that even a well-known provider can experience unexpected issues. Second, companies must take care to build in their own redundancies for systems in order to account for issues that are beyond their control to either predict or respond to. While providers are doing their best to preserve a viable and reliable cloud environment, the Amazon outage shows that a loss of service can happen at any time, and companies must be prepared to deal with such a cloud outage as soon as it occurs.