If you are at all interested in cloud computing you would have by now no doubt heard that this has been a dark day in the world of Cloud Computing today. Something has gone terribly wrong in the networking in the Amazon US-East-1 region. That caused storage subsystems to go on the fritz and the rest as they say is history. Some of the very prominent web properties have been down and the volume of chatter on the subject has been deafening. I had to turn the sound on my TweetDeck off a I could no longer stand to hear the constant beeping generated by the tweet arrivals.
What I am about to say will surely prompt our friends in Amazon Web Services to get their pitch forks out. I think this event is going to turn out to be a blessing for Cloud Computing. Here is why.
First of all, what this failure did is highlighted just how many companies out there use the cloud. We always knew about the big players but this outage affected even Microsoft who use Amazon EC2 servers to run some promotional activities on Facebook. So, while profanities are flying in the cyberspace from the cloud diehard crowd who have been affected by this outage, people on the sidelines may get a much better appreciation of just how pervasive the cloud has become.
Second, while many of us who are affected my blame Amazon for having an outage, I think we need to look at our practices as well. There is one thing that is certain and that is that no matter how well a data center is put together, it will suffer an outage at some point in time. That is why it is important to design for such an eventuality. And large scale cloud operators let us do exactly that. IBM Cloud maintains data centers in various geographies. Amazon has multiple availability zones implemented as independent compute resource pools (data centers?) in multiple regions around the world. For those who want to design for business continuity, the tools are there. We run a site called DB2University.com and it is located in the US-East region. However, our DB2 database servers are configured for high availability (this feature is called High Availability Disaster Recovery) with two servers in two different availability zones. If there was an outage in one zone, the standby server in another zone will take on the workload and will continue humming along. Even with this configuration we may not have survived the outage today as it affected multiple availability zones. So, we will have to consider going cross-region. This is a viable option for us as DB2 HADR has asynchronous mode that works quite well across great distances. We should be able to go between US-East and US-West zones or even across the Atlantic to EU-Ireland or AP-Singapore.
We are not keeping any of this to ourselves. We have published an easy to use RightScale macro that lets anyone build a highly available cluster of DB2 database servers and allows you to place the two nodes in to different availability zones and with just a bit more manual configuration you should be able to go across regions. Give it a try. Maybe next time there is an outage, and we all know there is going to be one, your application will stay up just like DB2University.com did.