« Back to the blog main page   |    Get Feed
August 26, 2012

The Hadoop Loop

By John Bantleman, CEO

What seems to be a current wave rippling across the Big Data and Hadoop market is the question of who is adopting specific Big Data technologies, what are the use-cases and the speed of that adoption.

No question the Hadoop and related tech innovation is running at quite a pace and arguably the market is taking shape even more rapidly.  Having said that, if you peel back the layers and dig a little deeper, you may find that many Hadoop projects are still “sand-pits” (nothing wrong with that) and among mainstream enterprises, the projects are actually more investigative and examining how Hadoop can augment existing databases and data warehouses and in fact some interesting use-cases are emerging beyond what Apache Hadoop originally set out to do which was to store and analyze multi-structured data at low cost scale.

I recently read a report by Wayne Eckerson sponsored by B-Eye-Network called “Exploiting Big Data”  (Strategies for Integrating with Hadoop to Deliver Business Insights) and interestingly 20% of those surveyed said they are experimenting whereas a larger 38% said they have no current plans.  Other findings in the same report revealed that there is a new analytics ecosystem emerging which has Hadoop solutions surrounding traditional DB’s and data warehouses where the more traditional BI user accesses data from the warehouse and the data scientist leverages data from the Hadoop environment. That same trend I’ve seen repeated with many of our customers and partners at the world’s biggest banks and telcos. In fact, we recently blogged about this new IT ecosystem taking shape which I often refer to as the Hadoop surround strategy or “loop” (see previous post here: Book-end Your Data Warehouse.)  The same B-Eye report also reveals that Hadoop is predominantly used as a staging area (92%) or prototyping tool which is similar to what a RainStor sponsored survey revealed where almost half of those using Hadoop are augmenting existing BI / Analytics environments.  The last thing that I found most interesting in the survey was the type of data being stored in Hadoop today and interestingly the fact that it is more transaction and semi-structured data and less so email, documents and images.  This will likely change over time but I found it interesting to see more traditional data types (transactions & logs) being the majority running on Hadoop (92%).

Coming back to my earlier point of Hadoop being used to augment existing environments or the surround strategy; it makes a lot of sense not only because enterprises have invested multi-millions in systems which they are not about to replace and more specifically it is very hard for a mainstream organization to quickly deploy a production-grade environment leveraging new, innovative technology where stringent requirements such as availability, security and rapid response rates are critical for end-users.  Let’s face it for an enterprise of any significant size; it can take 6 months to over a year to roll out any type of production-ready system.

There is no question that Hadoop/ MapReduce and all its supporting technologies enable businesses to do so much more with their data. What was initially viewed as a new way to conduct analytics has since become a platform or in fact ecosystem of new technologies that can address many more use-cases beyond MapReduce, Hive or Pig analytics.  The previously mentioned ETL or staging before sending data downstream to a traditional warehouse is a very attractive use-case and in fact is more cost-effective on Hadoop compared to the more traditional methods where you have to employ expensive licensed software packages but also many man-hours of professional services. Additionally,  Hadoop as the back-end online archive is gaining more attention in the enterprise today and is a much better alternative to offline tape.  If it enables IT to drive out some of the infrastructure costs, it’s a very good thing.

What is your Big Data Hadoop strategy?  Is it a loop or surround approach that Wayne Eckerson refers to in his report per figure below or is it an eventual replace?  It will be interesting to watch how deployments take shape over the coming year but I do know that the market is moving very quickly and sometimes it does feel a bit like a roller coaster or “loop-the-loop”.