Tuesday, February 10, 2015

WSDM 2015

I recently got back from WSDM which ended up being a very interesting conference with some great talks.

Keynotes

I enjoyed all three keynotes. Mike Franklin's talk focused on AMPLab's research progress and the steady stream of artifacts they've assembled into the Berkeley Data Analytics Stack (BDAS -- pronounced "bad-ass" :)). In addition to Spark, the stack now includes SparkSQL, GraphX, MLLib, Spark Streaming, and Velox. All of these projects pose interesting systems questions and have made a lot of progress on the kinds of tooling that will come in handy for large scale data analysis beyond SQL queries.

Lada Adamic's talk was a fun visual tour through lots of interesting analyses of Facebook data that her team has been running. In particular, she's been using the data to understand what makes a particular piece of content rapidly gain a lot of popularity (or go "viral"). "Sharing" behavior is unique to social networks -- traditional web properties have spent lots of energy understanding what makes people click. The "share" action is a completely different beast to study. Lada reported that they have had some success predicting if a piece of content, once it reaches K users, if it will get to 2K. The prediction accuracy improves as we get to larger K. Unfortunately, she reported that the features that predict virality mostly have to do with the speed at which it is spreading, so we don't yet have a handle on how best to craft viral content

Thorsten Joachims had a technical keynote that followed the arc of his previous work on learning to rank in the context of search. He talked about careful design of interventions in interactive processes to elicit information that can be suitably leveraged by a machine learning algorithm to improve ranking. Lots of examples with evidence from experiments he ran with the arxiv search prototype that have since been reproduced by Yahoo, Baidu, etc.

Research Talks

Here's a subset of research talks that I thought were fun and interesting: