Basic System Design
BigTable, Dynamo, and PNUTS are the three big systems running real applications and should be required reading. FAWN is an excellent read. The HStore paper talks about traditional OLTP, but is definitely closely related. Hyder is also in the traditional OLTP space and is probably less relevant to the NoSQL space than HStore, but this is a very cool system and explores a major departure from traditional OLTP system design. The RamCloud effort from Stanford has made some interesting choices that allow very good availability numbers. Finally, I'll include a shameless plug for my own paper, Spinnaker, which I think makes some interesting points about this space.
Consistency, Indexing, Transactions
- PNUTS Indexing and Views
- Google Megastore
- Helland's CIDR 2007 paper, and the one in 2009 with Campbell. I've had so much fun reading these.
- Dave Lomet's take on transactions in the cloud
- Abadi's musings on CAP and PACELC
- Hellerstein's group on quantifying eventual consistency
- PiQL , a project from Berkeley, makes some interesting arguments on guaranteeing bounded time querying:
Other Industrial/Open-Source Systems and Articles
- Curt Monash's valiant struggle to distinguish between traditional OLTP and "NoSQL" apps: HVSP
- Industrial MySQL Scaleout approaches: Clustrix, Schooner , dbShards
- Oracle NoSQL Whitepaper
- HBase, Cassandra, MongoDB -- codebases/architecture