When a database person mentions storage, images of cabinets full of disk drives and cables come to mind. But, since in my recent post I promised to talk about cloud computing I figured I’d talk about a different kind of storage. I am talking about storage that you can’t see or touch; I am talking about storage in the cloud. When Amazon (the people that sell us books and CDs) launched their web services AWS, S3 was one of the first services to go online. S3 stands for “Simple Storage Service”. Amazon built out a very significant IT infrastructure to support its on-line business and it does offer it for rent to anyone willing to pay for using it. Amazon S3 is basically off-site storage for rent.
Amazon S3 storage, however, is not your regular disk kind. In other words, you are not renting hard disk space in some data center. You are actually storing your content as a set of objects with each object being up to 5GB in size and you store these objects in buckets. Unlike a regular hard disk, you don’t just write these objects with a series of operating system write requests. Instead, you use web services protocols like SOAP or REST to read and write your objects to/from their buckets. Oh, and your buckets can be located at the Amazon data centers in Europe or the United States. Just because you are putting your data in to a data center that does not belong to you does not mean you are making your data public. Your data is your data and it is treated as such … you just happen to be renting storage space from Amazon.
So, how much does it cost to rent? Amazon charges 15 cents per gigabyte per month in United States and 18 cents per gigabyte per month in Europe. I think this is very reasonable … actually, I think it is pretty darn cheap. But, what is really nice is that you only pay for what you actually use. Remember, you are not renting disk drives in some data center, you are renting capacity and paying for it as a metered service. This is similar to the way we buy electricity and water and that is why some people refer to this as “utility computing”. A couple of days ago Amazon announced a new pricing structure that will come in to effect on November 1, 2008. The new pricing structure still starts at the same 15 cents per GB-month for the first 50 terabytes but it gets cheaper as you actually use more storage. Basically, Amazon is giving volume discounts to heavy users. So, you can expect to pay 14 cents per GB-month for 50 terabytes over the first 50 TB, 13 cents for the next 400TB and 12 cents per TB-month for anything over 500TB. You don’t just pay for storage; you also have to pay for data transfer at 10 cents per GB and you also have per request charges of 1 cent for 1000 what is effectively write or list requests (PUT, POST, LIST) and 1 cent per 1000 read (GET) requests. If this is making you a bit dizzy, you can use a handy monthly calculator that Amazon provides to figure out S3 storage costs.
OK, so, by now you are probably thinking “what does this have to do with databases?” After all, S3 storage does not look like disk to your database there is another kind of storage called EBS that does, but that is a subject for another post) so it is not like you can use it as your database storage for DB2 Express-C or MySQL. Amazon actually expects you to have a web application that stores and retrieves data to S3 storage. Let me rephrase it … Amazon expects you to write code that uploads and downloads data using its S3 API. Fortunately, there are a number of applications written that free you from slaving over some Ruby on Rails or PHP, C# or Java etc. code just to upload some data. For example, I use a Firefox add-on that provides a nice interface for transfering files between my machien and my S3 account. Hopefully by now it is getting clearer that this may be an interesting way to manage off-site backups for your databases. Here is how this can work. You create a backup strategy for your DB2 Express-C database using DB2 Control Center and make it back up to a local hard disk. Then you simply upload your backup files to your S3 account and you are done. Should disaster strike you have access to your backups from either your primary location or from some other location where you can bring up your DB2 Express-C server. Simply download the backup file to your server and run a restore job. Nice, simple and very inexpensive. Compare this against the old fashioned way of backing up to tape and then sending your tapes off-site for safekeeping. I think the benefits are obvious. You get backup that is kept safe and secure off-site. The backup is available for recovery almost instantaneously and in any location. And the cost is proabably less than what you would pay for just shipping the tape.