Saturday, July 30, 2011

Hadoop: A Disruptive Technology?

This is going to be a rather non-technical post ... you have been warned :-)

Clayton Christensen's thesis (Innovator's Dilemma) more than a decade ago was that rational management practices of listening to your customers, and building and investing in what solves your customer's problems can lead you astray in the context of disruptive technologies. I've heard the phrase "Hadoop is a disruptive technology" in various contexts. I wondered ... does it really hold true according to Christensen's classic definition?

For a technology to be "disruptive", it should satisfy three requirements
  • It is a simplifying technology -- cheaper/easier than established technologies
  • Commercialized first in emerging or insignificant markets
  • The market leader's biggest customers generally don't want to and initially can't use it
Simplifying Technology

Assuming that "big warehouse" vendors are the established leaders in the market, I'd guess that this is true. Big warehouses are messy and complicated to install, configure, tune, and get running. Hadoop is certainly simpler to install. Warehouses are usually deployed with non-commodity storage/networking. Hadoop runs on  cheap commodity boxes. Is it simpler to use? That could go either way. If your problem fits in SQL, relational warehouses do an amazing job. Writing MapReduce programs is a good bit more work than writing a SQL query. Oh, and Hadoop is definitely cheaper :-) I'd give this a +0.75

Emerging Markets/ Value Networks

This is an easy +1. No one is clamoring to replace their big enterprise warehouse or even their big data marts with Hadoop. It is not clear how big the market is for Hadoop and related technologies. It isn't entirely clear what the market is. The early users have been big web companies that need to analyze web pages, search logs, and click logs. None of this data typically goes into a big relational warehouse. Monash has been arguing that the predominant driver for Hadoop has been the "big bit bucket" use case. Anant Jhingran alluded to many emerging use cases in enterprises -- most of these fall under the kinds of analytics that are/were not commonly done in traditional warehouses. The places where Hadoop is getting interest is not from the sort of enterprises/departments/groups who have typically bought warehouses to solve their data analysis needs.

Unattractive to Market Leaders and their Customers

Traditional enterprises don't and can't use Hadoop as their central data warehouse. Not today, and perhaps not in the near term either. The big central EDW is typically way more complicated than a typical Hadoop cluster - there is a complex ETL system feeding the beast, there are a bunch of tools producing and sending out reports, and there may be other ETL-off ramps feeding smaller data marts. Big EDW customers use complex SQL, many reporting tools that work with the warehouse, advanced features like views, triggers, and often even transactional updates. Big enterprises looking for an EDW solution simply cannot use Hadoop today. As a result, this is not the most attractive technology for big warehouse customers. Again, a +1.

That's 2.75/3. Neat. All this talk of  Hadoop being a "disruptive technology" is not hot air after all. It does fit Christensen's definition rather well!

No comments:

Post a Comment