Monday, October 8, 2012

MongoDB: Why Database folks have it wrong.

It is easy for relational database folks to dismiss MongoDB as  "silly". Research veterans have poked fun at it, there is a great video on youtube ("MongoDB is Web Scale") that cracked me up with its criticism of Mongo fan-boys, a recent VLDB paper from Wisconsin compares MongoDB and Sharded SQLServer on the YCSB benchmark showing that Mongo's performance falls way behind.

I think all of this misses the point. IMHO, MongoDB is not trying to be a more scalable, higher performance database for application developers who couldn't get this from sharded MySQL or any of the other commercial offerings. To understand why Mongo is so interesting, you have to look at the entire application programming stack, and how it is different from the previous generation. The standard 3-tier architecture of database, app/web-servers, browsers is getting rebuilt using very different technologies driven by the enormous amounts of effort pouring into building applications/sites for mobile devices:



A few years ago, building a Web 2.0 application would require: 1) a team that knew Javascript/AJAX stuff really well, 2) a few guys that knew how to write server-side scripting logic in PHP, or Ruby, or Python, 3) basic MySQL skills because you could use the ORM in the app language (Rails, Django etc.) to do most of the talking to the DB. As more and more applications are getting built by highly skilled web-developers, smaller shops may find that being able to build a larger part of their application in a single language, (say Javascript!) might help them get the job done with fewer people as opposed to this traditional choice.

Given the fact that Javascript is a less than ideal language for developing complex applications, there are many pieces of technologies enabling this push. Technologies like node.js provide server-side scripting in Javascript. Given how much effort has gone into engineering V8, it is no surprise that some people such as the mobile app folks at LinkedIn are beginning to notice that node.js may actually provide faster throughput than RubyOnRails. Now, the same team of web developers building browser code can also build a big chunk of the application logic in Javascript. Technologies like backbone.js provide models with bindings to key-value stores and rich collection APIs to query/process them. This is making it easier to build complex applications in Javascript.

Now, can we provide an persistence layer here that is extremely agile, flexible, and one that can be accessed from code in a Javscript VM? I think MongoDB is trying to be that persistence layer. Relational databases still offer too much friction for flex-schema applications written in Javascript. A database that does a really good job of storing JSON is likely to be more successful. There are certainly many ways to talk to a relational database from a Javascript VM: node.js has ORMs that work with several databases as described here. For high-performance web-sites built using Javascript for both client-side and server-side scripting, talking to relational DBs through ORMs in JS is probably the stable answer for now. But as MongoDB or a similar JSON-store matures, the friction that ORM+RDBM involves may start to outweigh the performance benefits of RDBMSes.

If all three tiers of the stack can now be programmed in Javascript, this opens up very interesting new questions: with the right compiler technologies, we can fluidly determine the boundary between client-side logic and server-side logic as the application and the client hardware (phones, tablets) evolve! Deploying the app on a CPU-poor device? Push more of the processing logic to the server side. Is the app being used on a desktop? Then push more of the processing to the client side. Having everything in Javascript is going to make it much easier to build applications like this!