There is an interesting discussion unfolding in the LinkedIn Hadoop Users group. Kevin Haynes from Greenplum started off with a question that generated a good spirited discussion. I tried to put a comment in but since I am never short for words I exceeded the maximum comment length. I decided to post my comment on this blog instead. Hope some of the discussion will spill over here or maybe a few will find this blog interesting enough to subscribe and follow.
What I wanted to do was to make the case for IBM acquiring leadership position in the Big Data space. Before I do that a 2 important disclosures. First, I work for IBM and I am on the Big Data team. Second, denying Cloudera and Yahoo (now HortonWorks) as being the thought leaders in the Hadoop space is just disingenuous … just plain dumb … so, don’t expect me to do so.
IBM’s Ability to Execute in Big Data
Now that I got this off my chest … the case for IBM as the leader in Big Data. Gartner build a franchise around placing technology vendors in to magic quadrants. Those with ability to execute and completeness of vision get to go in to the top right quadrant. I have not see Gartner’s magic quadrants for Big Data but I would think that IBM would end up somewhere high and to the right in the leader’s corner. For “ability to execute”, it is a hard case to argue against IBM. It has formidable resources and once it commits to a direction it brings these resources out in full force. IBM Big Data team has very quickly ramped up by bringing top notch engineering from all corners of its Information Management team (would be a major IT vendor in its own right if it was an independent company). I personally don’t think counting number of committers is a great way to go about gaging completeness of vision or staying power but, if you must ask, IBM does have several committers with another committer joining the team a few weeks back. There is also a very strong precedent at IBM. A while back, IBM jumped on the Apache web/app server bandwagon branding its offering WebSphere. IBM WebSphere is now the undisputed category leader. Since history tends to repeat itself, there is little reason to believe that IBM will not be able to execute the same in the Big Data space. I think that on the company evaluation scales, IBM wil weigh in heavily on the “ability to execute”.
Completeness of IBM’s Vision for Big Data
Some have equated Big Data with Hadoop. Hadoop is an important tool in the world of Big Data but placing an equal sign between Big Data and Hadoop is a mistake. I think IBM has wisely expanded its vision of Big Data to a much broader scope. “When you have a hammer, every problem looks like a nail” is not the approach that most enterprises favour, so IBM messaging of having multiple tools in the toolbox resonates well with enterprises and industry analysts. IBM’s idea of dealing with high velocity data such as streams, complex entity extraction for huge volumes of text, entity analytics and many others, are all in the Big Data domain. IBM Watson is arguably as good an example as any of dealing with dealing with a Big Data problem yet it is not based on Hadoop.
But let’s get back to Hadoop. Completeness of Vision can not be measure in absolute terms it is grossly subjective. IBM has a big edge when positioning itself as a thought leader. The old adage that nobody ever got fired for buying IBM may not resonate as strongly as it ones did but it is still there especially when it comes to Hadoop. It is just plain difficult to compete in the enterprise with the power of the IBM logo no matter how much VC backing you have. Sure, Yahoo, Facebook and Google don’t much care if they get their technology from SourceForge or GitHub or buy it from IBM. But enterprises do care … they care a whole lot and given a choice they will gravitate to major vendors like IBM with whom they have established relationships. There is also something else i.e. the procurement process in the enterprise space is stacked in favor of IBM vs a small single product vendor. Enterprises like to buy, and vendors like to sell, Enterprise-wide deals that encompass a large number of products. It gives enterprise procurement leverage to negotiate better deals and IT the flexibility to swap one product for another as their needs evolve. You may have signed an ELA with gobs of DB2 and WebSphere licenses and now have a Hadoop project. Under an ELA you may be able to just swap BigInsights (IBM Hadoop) for your other products and you don’t have to go through the budgeting, approval and justification cycle. This alone will sway adoption in the direction of IBM.
There is also a technology side and it is not the number of committers that makes a difference here. In an enterprise setting, core function alone is not as important as complete integration in to existing IT systems. This is another area where IBM has an edge. IBM has put in a lot more of the IT infrastructure in place than any of the leading Hadoop contenders. IBM knows how to integrate data warehouses, ERP systems and myriad other systems with Hadoop better than anyone else and I bet this is going to be a very important for a leader. Security in Hadoop is not going to cut the mustard in the enterprise and IBM knows how to do enterprise security … the kind that is not just secure but passes compliance tests. IBM BigInsights Enterprise integrates with LDAP security, a mainstay in the enterprise, out of the box. Then there is overall operation of Hadoop. To the uninitiated it is a jungle of ports and urls just to get to a half a dozen or so of web consoles to run a Hadoop cluster. That is why BigInsights Enterprise has a single management console to administer the entire cluster. Each one of these features on their own is not a deal maker, but I do think they point to a strategic direction towards BigInsights bringing enterprise-ready Hadoop to market. That is not to say that Cloudera and others will not execute the same strategy. However, I do believe IBM will have an easier time convincing corporate IT that it is a more capable provider of Big Data technologies.
What do you think?