Tuesday, June 25, 2013
Big Fast Data Blog from Wisconsin
My advisor Jignesh Patel has a new blog on data management. This is going to be a great place to get the latest on data management from Wisconsin. His first post is on BitWeaving -- a SIGMOD 2013 paper that takes the column store idea to its logical extreme. Check it out here.
Thursday, June 13, 2013
Cassandra Summit 2013
I spent some time at the recent Cassandra Summit 2013 in San Francisco. It was great to see such a big, engaged community of Cassandra users in such a short while. Some observations:
As anyone who's watching this space knows, Netflix was and continues to be one of the biggest users of Cassandra in production. Adrian Cockroft from Netflix had an interesting presentation --
As anyone who's watching this space knows, Netflix was and continues to be one of the biggest users of Cassandra in production. Adrian Cockroft from Netflix had an interesting presentation --
- Netflix has a cross-datacenter Cassandra cluster up and running. The "datacenter" here is really an Amazon availability zone. But they've spanned North America and Europe and run several benchmarks on it successfully.
- Netflix has a bunch of OSS code out on Github to make it easy to spin up, expand, shrink, and shut down their Cassandra clusters on AWS. They continue to release other code under the "Netflix OSS" brand.
- Astyanax, a Java client for Cassandra seems to be Netflix's preferred way of interacting with Cassandra.
- Rather intriguingly, Adrian mentioned that it might be nice to have an Astyanax port to DynamoDB, which might be quicker for small clusters, and use a full-fledged Cassandra cluster only for larger sizes at which DynamoDB gets too expensive. I wonder, as Amazon continues to drop its prices for DynamoDB, if the point at which you'll need to spin up your own Cassandra cluster will continue to move towards larger and larger clusters.
- There was some discussion of the cost of running Cassandra on bare-metal vs on AWS, and Adrian seemed to indicate that for them, the additional cost of running on AWS was outweighed by the benefits of speed of execution. I wonder if once you have a relatively stable system, and the cost of running it starts to become an important component of the system, having it on Cassandra instead of DynamoDB will give you more control should you decide to move it on to your own datacenter with the right hardware for Cassandra.
- Adrian also mentioned that for their analytics, they still have a Teradata, but it is slowly being replaced by a combination of Amazon RedShift and Hadoop. He did say that when your data is spread across multiple Cassandra clusters, joining across them to run some analytics on Hadoop can be a lot of work.
- I found the transition off of Teradata really interesting. Netflix is one of the largest companies with a very large portion of their IT infrastructure in the cloud. The pace at which they move workloads off from Teradata into RedShift and Hadoop will likely be much higher than what typical enterprises can do, but I'll be watching this as an indicator of what might come.
There were several other talks from startups and mid-sized enterprises describing how they were using Cassandra. There still seems to be some confusion on consistency and how much to think about it. But for the most part, it seemed to me that the community has decided to use quorum, and not worry too much about it. As one speaker put it, "if once in a year, we lose how far into the video you are when you resume playing later, the world doesn't end". The biggest class of data that continue to be managed by Cassandra seems to be user-specific data (profiles, bookmarks, activity history). For this, I think Cassandra makes perfect sense.
Thursday, May 30, 2013
IBM's BLU
I'm very excited to see that IBM has finally announced BLU, an architecture-aware compressed columnar engine in DB2. My old friends at Almaden Research worked super hard on this project, and were waiting until the product was released before they could brag about the stellar performance results that BLU achieved internally. I'm looking forward to seeing the BLU papers finally getting out to the research community. Here's Curt Monash's summary of the product announcement.
Edit:
Here's a great video describing Blink from Guy @ Almaden.
Edit:
Here's a great video describing Blink from Guy @ Almaden.
Sunday, May 26, 2013
Javascript Applications
In a previous blog post, I talked about how Javascript is already the language in which so many mobile and web applications are being built. It is not a huge stretch to expect at least some of the back-end code to move to Javscript. Node.js and backbone.js are already making this easier. What I've recently been amazed by is how much of the native desktop experience can be re-created on the browser.
Check out Epic/Mozilla's port of the Unreal 3D engine to HTML5 here. I remember a time growing up when my desktop was too slow to run Unreal, and today, we can run it in the browser! Given this kind of performance, applications like photo-editing or even light-weight video editing could be delivered over the web with snappy interfaces that don't require round-trips to the back-end for everything. Pushing some of the processing to the cloud will certainly make certain kinds of editing that were too demanding for an average desktop processor possible with a cluster of GPU-based servers on the cloud. It will also likely make new kinds of workflows and actions possible.
As for the less sexy back-end logic, having that be in Javascript, and running efficiently will certainly open up new possibilities. This isn't a new idea -- Netscape tried this in the mid-nineties and it didn't really take off. However, Google's V8 engine has made running Javascript applications so much more efficient, that replacing a Python/Django or Ruby On Rails stack with Javascript seems entirely reasonable. Consider the kinds of things you could do with this -- You could build a full web application in a single language (Javascript), and optimize it differently for a desktop browser, or a tablet, or a phone. As your application evolves, you might find it easier to move some functionality back and forth between the server and the client. I expect we'll see application servers provide a good environment for Javscript apps much like we have for Java (Tomcat, Weblogic, Websphere etc.).
I expect we'll see ever more interesting and sophisticated apps delivered on the desktop browser, and watch them quickly flow down to browsers on tablets, and eventually phones.
Wednesday, May 22, 2013
Interesting Papers
Here's an assortment of interesting papers from some of my colleagues in the recommendation/machine learning space that I've recently read.
- BPR: Bayesian Personalized Ranking from Implicit Feedback, Rendle et. al.
- Supercharging Recommender Systems using Taxonomies for Learning User Purchase Behavior, Kanagal et. al. in PVLDB 2012
- Latent factor models with additive and hierarchically-smoothed user preferences, Ahmed et. al. in WSDM 2013
- Matrix Approximation under Local Low-Rank Assumption, Joonseok Lee et. al. ICLR
- Sibyl: A system for large scale supervised machine learning
Wednesday, January 23, 2013
Productivity
I've recently discovered three interesting resources that have been very helpful in improving my general productivity. I'm sharing them here.
Any.do
Any.do is a really well designed TODO app with a gesture-based UI that works on iOS, Android, and on Chrome on the desktop. I've used several TODO apps before, but this one is my favorite because even after several weeks, I'm still using it.
The UI is extremely low-friction. It has enough features to be useful without being too complicated. You can assign times/locations to TODOs, get reminders, postpone, or attach notes to items. Everything is classified Today/ Tomorrow/ Upcoming/ Someday. It includes free syncing across all your devices. It even integrates cleverly with GMail. And, it makes cute noises when you finish items. The same kind of reward that made Angry Birds so addictive!
The Power of Habit by Charles Duhigg
This book by Charles Duhigg is on the neuro-psychology of habits. Like most business-type books this one is much longer than it needs to be. But, it gives you a really useful way to reason about habits by describing the cue-routine-reward loop and how the brain stores habits in a different part responsible for "automatic" functions. Techniques here can make it a little bit easier to build new (good) habits. I've been able to build at least one good habit. I'm still trying to get rid of some bad ones l picked up in Grad school (late night snacking)!
HBR Ideacast
I started listening to Harvard Business Review's Ideacast a couple of years ago. HBR produces short (15-20 min) podcasts - mini interviews with accomplished people (with a focus on business) on different topics and makes it available for free. Some podcasts will help you discover some great books/ideas/tools, while others are moderately entertaining. I've gotten ideas on time management, improved how I communicate with managers, discovered some great books, gained a more sympathetic understanding for why large organizations act the way they do, and have gotten better at dealing with meetings. Pretty good mileage from a podcast!
If you have an interesting productivity resource you'd like to share, leave a comment!
Any.do
Any.do is a really well designed TODO app with a gesture-based UI that works on iOS, Android, and on Chrome on the desktop. I've used several TODO apps before, but this one is my favorite because even after several weeks, I'm still using it.
The UI is extremely low-friction. It has enough features to be useful without being too complicated. You can assign times/locations to TODOs, get reminders, postpone, or attach notes to items. Everything is classified Today/ Tomorrow/ Upcoming/ Someday. It includes free syncing across all your devices. It even integrates cleverly with GMail. And, it makes cute noises when you finish items. The same kind of reward that made Angry Birds so addictive!
The Power of Habit by Charles Duhigg
This book by Charles Duhigg is on the neuro-psychology of habits. Like most business-type books this one is much longer than it needs to be. But, it gives you a really useful way to reason about habits by describing the cue-routine-reward loop and how the brain stores habits in a different part responsible for "automatic" functions. Techniques here can make it a little bit easier to build new (good) habits. I've been able to build at least one good habit. I'm still trying to get rid of some bad ones l picked up in Grad school (late night snacking)!
HBR Ideacast
I started listening to Harvard Business Review's Ideacast a couple of years ago. HBR produces short (15-20 min) podcasts - mini interviews with accomplished people (with a focus on business) on different topics and makes it available for free. Some podcasts will help you discover some great books/ideas/tools, while others are moderately entertaining. I've gotten ideas on time management, improved how I communicate with managers, discovered some great books, gained a more sympathetic understanding for why large organizations act the way they do, and have gotten better at dealing with meetings. Pretty good mileage from a podcast!
If you have an interesting productivity resource you'd like to share, leave a comment!
Subscribe to:
Posts (Atom)