Amazon Web Services is arguably the most recognized name in the world of Cloud Computing. Amazon is mostly known for its storage as a service S3 and its compute power as a service EC2 offerings. A less known Amazon cloud offering is a cloud-based database called SimpleDB which is currently in beta. SimpleDB is quite a bit different from traditional relational databases. It is a simple key-value pair database that targets web developers who don’t need or want a relational database.
Amazon has also partnered with major DBMS vendors and today one can deploy major relational DBMS on the Amazon EC2. We offer IBM DB2 and IDS for deployament on Amazon EC2 and this week Amazon’s CTO Werner Vogels described DB2 as having the most inovative approach for cloud deployment.
Today, Amazon announced its second entry in to the world of cloud databases. Called Amazon Elastic MapReduce, this appears to be a hosted implementation of the Hadoop framework. Hadoop, in a nutshell provides a way to analyse very large amounts of data by employing large number of processing nodes working independently. One does not use Hadoop or MapReduce just as another database where you have a connection from an application where you submit a query or an update operation. An application that uses MapReduce is more like a batch job that one submits but, instead of running on a single server, the application and data is spread across many servers with each one crunching the data. This is an approach that is often used by companies like Google and Yahoo to analyse vast amounts of information. It is also very popular in many scientific communities.
Hadoop is the kind of application that, at least on the surface, is a natural fit for the elastic nature of cloud computing. Instead of procuring large computing clusters one can just go to Amazon to run a job and pay only for the resources use by that job. However, running such a job will require transfers of very large volumes of data in to and out of the cloud. And, while compute charges on EC2 and storage charges on S3 are quite low, data transfers charges can really add up. Amazon’s home page for MapReduce has a pretty good explanation on the charges for the MapReduce itself and the EC2 charges that one can expect but it is silent on the data transfer charges. I have to assume that these are standard Amazon S3 data transfer cahrges as S3 is both the source of data as well as the destination for the output. These charges are 10 cents per GB for transfer in (on sale for 3 cents till July 1, 2009) to S3 and 17 cents per GB for retrieving your data from S3. Since most hadoop jobs would require very large data sets this can get expensive. Yes, it will be much cheaper that doing something like this on your own equipment but it is not exactly going to all of a sudden democratize the world of very complex data analysis and just make it available to everyone. One thing is certain, this is a terrific way for Amazon to generate significant revenue by letting more of us transfer and store more data in to S3 and spinning up hundreds if not thousands of EC2 machine images. I think I am going to buy some Amazon stock!
I think the author of this blog and the readers may may want to try out CloudBerry S3 Explorer Freeware to manage Amazon S3 online storage and configure CloudFront CDN http://cloudberrylab.com/
Can we use cloud database in iphone application?
There are really two types of iPhone applications. One are the applications written using iPhone SDK to run on the iPhone OS itself with no dependency on anything that is external to the iPhone. This is an overwhelming majority of applications that you will find on iTunes many of which are games (some really great ones). Another kind of iPhone applications are those that run within the Safari web browser and as such require connectivity to some sort of a server back-end. This is the kind of application that is perfect for cloud deployments. Cloud Computing pretty much assumes that an application will run within the context of a web browser and iPhone Safari is a very capable web browser. So, provided you do not use Adobe Flash or Microsoft Silverlight (not supported on iPhone) you can make available very sophisticated enterprise-class applications on to the iPhone users. Naturally, they will need to have data connectivity to use the application as the actual back-end of the application runs in the cloud. It is also natural to assume that the application server in the cloud would be interacting with a database server to deliver data to the iPhone and to store data that originates on the iPhone. DB2 can be such a database but any other DBMS or cloud based database can also be used.
I have used EC2 a few times for rendering Blender animations. It’s a great service, but I discovered a hidden fee for the elastic IP, which I didn’t need. I didn’t see any price listed for it, so I set up an IP address while exploring some other possible uses of the service. I didn’t know about the cost until I got the monthly statement. It was only a couple of dollars, so no real harm done.