I enjoyed all three keynotes. Mike Franklin's talk focused on AMPLab's research progress and the steady stream of artifacts they've assembled into the Berkeley Data Analytics Stack (BDAS -- pronounced "bad-ass" :)). In addition to Spark, the stack now includes SparkSQL, GraphX, MLLib, Spark Streaming, and Velox. All of these projects pose interesting systems questions and have made a lot of progress on the kinds of tooling that will come in handy for large scale data analysis beyond SQL queries.
Lada Adamic's talk was a fun visual tour through lots of interesting analyses of Facebook data that her team has been running. In particular, she's been using the data to understand what makes a particular piece of content rapidly gain a lot of popularity (or go "viral"). "Sharing" behavior is unique to social networks -- traditional web properties have spent lots of energy understanding what makes people click. The "share" action is a completely different beast to study. Lada reported that they have had some success predicting if a piece of content, once it reaches K users, if it will get to 2K. The prediction accuracy improves as we get to larger K. Unfortunately, she reported that the features that predict virality mostly have to do with the speed at which it is spreading, so we don't yet have a handle on how best to craft viral content
Thorsten Joachims had a technical keynote that followed the arc of his previous work on learning to rank in the context of search. He talked about careful design of interventions in interactive processes to elicit information that can be suitably leveraged by a machine learning algorithm to improve ranking. Lots of examples with evidence from experiments he ran with the arxiv search prototype that have since been reproduced by Yahoo, Baidu, etc.
Here's a subset of research talks that I thought were fun and interesting:
- Delayed-Dynamic-Selective (DDS) Prediction for Reducing Extreme Tail Latency in Web Search: a nice example of exploiting the nature of search servers to go beyond the standard techniques described in the Tail at Scale, which I talked about recently. The paper was the runner-up best paper award.
- Robust Tree-based Causal Inference for Complex Ad Effectiveness Analysis: applying causal inference techniques to ad-campaign data.
- FLAME: A Probabilistic Model Combining Aspect Based Opinion Mining and Collaborative Filtering: ideas on how to better model aspects of a user-rating when there's some text in addition to a numeric rating. Amazon product reviews being an obvious example.
- Just in Time Recommendations – Modeling the Dynamics of Boredom in Activity Streams: modeling repeat consumption of music and figuring out when a user wants to listen to more music of the same kind (same artist, same album) vs. when the user has gotten bored, and wants to move on to something else.
- User Modeling for a Personal Assistant: Srikant's talk on the user-modeling that goes into Google Now. Fun insights into the practical problems that were solved to build Google Now cards.
- Inverting a Steady-State: the best paper award winner from our group at Google.