Saturday, July 9, 2011

HBase, Hadoop, and Facebook Messages


If you were at the Hadoop summit and watched Karthik Ranganathan’s talk, you know how Facebook now uses HBase to store all the inbox messages. It was a fun talk and Karthik shared many of the hard-won lessons from deploying and maintaining a large application on HBase and Hadoop/MapReduce. Here are a few observations:

Karthik and the rest of his team at Facebook didn’t choose Cassandra even though they have the necessary skills and expertise at Facebook to do this. In fact, Cassandra was originally built by Facebook engineers to store and index their inbox messages. Why? In fact, one of the audience members also asked this question. As we argued in the Spinnaker paper, you don’t want to force your application developer to deal with eventual consistency unless your availability needs cannot be met any other way. Writing automatic conflict resolution handlers is messy work, and as your application gets more complicated than a shopping cart, it is increasingly more difficult to reason about it. HBase with its easier consistency model likely made life much easier for the team building the messages service. Karthik's response was essentially the same.

If Spinnaker was available, would the Facebook guys have used it instead? I’d like to think the answer is yes :-) .  Like I explained in a previous blog post, Spinnaker provides a similar consistency model as HBase and better availability and performance.  In fact, we talked to Karthik and his team in the early days when they were planning the migration to HBase. These guys are smart engineers – they liked the ideas in Spinnaker, and they asked the right questions: 1. Is this open source? 2. Is this battle-tested? 3. What happens when disks fails completely or blocks get corrupted? There’s a bunch of code in HDFS that HBase gets to leverage for robustness to such failures. Spinnaker is/was just a research prototype and would require substantial engineering effort to get it ready for a production app like FB’s messages. But there's a simpler answer for why HBase is a good fit here and Spinnaker would not be needed even if it were available ... read on.

The availability guarantees provided by HBase are probably sufficient for the Facebook messages app.  Let's do some quick calculations: on a cluster with about 100 nodes, if you expect that a regionserver crash occurs every 30 days, and conservatively, it takes about 10 minutes to recover, the availability of a row in HBase is approximately: 0.999768. (I'm just using 1 - MTTR/MTTF and making a whole bunch of simplifying assumptions.) If all the data for a user is stuffed into a single row, which seemed to be the case in FB’s design, this is perfectly acceptable availability: for a user that visited his message box twice every day, this would mean 0.169 "Not Available" errors a year. Pretty reasonable.

On the other hand, ff you have a more complex application, and you needed access to *all the rows* to support your application, the availability of your application would be 0. 999768 ^ 100 = 0.977115. Although this looks close enough to the previous number, this is really bad. The same user that visited his inbox twice a day would see about 16 "Not Available" errors a year. That's more than an error every month. That could make for unhappy users. By designing the application appropriately, the Facebook guys were able to fit well within the availability guarantees that HBase can provide. Nice job!

Spinnaker still has other performance advantages ... but that's for another blog post :-)


1 comment: