Tuesday, February 14, 2012

DynamoDB = Cassandra-as-a-Service ?

Amazon recently introduced a new database service: DynamoDB to add to its existing set of database services: SimpleDB and RDS (the hosted MySQL service). DynamoDB seems to be a higher-scale, lower function version of SimpleDB: it does away with the limitation that data be partitioned into 10GB sized SimpleDB domains. It also provides much more predictable performance and you can provision for your expected read and write rates. In exchange, you give up the automatic indexing ability that SimpleDB had. The querying ability goes away too -- you fall back to get/put requests addressed by primary keys. The conditional put/delete calls are still there as a form of simple concurrency control.

DynamoDB comes pretty close to being a pay-as-you-go version of Cassandra. Not surprising, since at least one of the original developers of Cassandra (Avinash Lakshman) actually worked on the Dynamo project at Amazon before joining Facebook.

The major Cassandra advantage seems to be secondary indexes. I'm not sure how important local secondary indexes are for short-request apps, but it could be useful to do predicate scans when you hook up MapReduce to the data in Cassandra. The major feature that DynamoDB has that Cassandra is missing is conditional put/delete calls. This is a rather tricky feature -- Jonathan Ellis (Datastax), Jun Rao, and I tried to add this in the early days of Cassandra (https://issues.apache.org/jira/browse/CASSANDRA-48) and quickly realized it was a good bit harder to provide clean semantics for this call on an eventually consistent datastore.

Jonathan Ellis at DataStax posted a feature-comparison of Cassandra and DynamoDB at http://www.datastax.com/dev/blog/amazon-dynamodb. The short summary is: if you are running things on the cloud, then DynamoDB is indeed a compelling choice.  If you are a very large operation, and want a multi-datacenter deployment, expect to need more functionality than DynamoDB (snapshots, integrated caching), and have an ops team that can manage it, then Cassandra is probably a better bet.

No comments:

Post a Comment