The post surprising part of the summit was the sheer number of new VC funded startups that were there. Last year, the major ones were Cloudera, Datameer, and Pentaho. This year a host of new companies appeared: MapR, Zettaset, Arista, PervasiveDataRush, SyncSort, and DataStax (previously Riptano).
One observation I had was that there seems to be a flurry of activity in the scalable PubSub system space. At least five systems were discussed/mentioned:
- Kafka from LinkedIn
- Scribe from Facebook
- Flume from Cloudera
- Hedwig from Yahoo
- Data Highway (Yahoo)
Of these I have a basic understanding of Kafka and Hedwig, but I haven't used either.They seem to have made different design choices. Kafka is aimed squarely at log collection ... they argue that some of their API choices were better than the ones offered by Scribe. Hedwig seems closer to a true scale-out queuing system with guaranteed in-order at-least-once delivery. It'll be fun to do an in-depth comparison at some point.