Computer Science Seminar

2012 Mar 14 at 10:30

DC 1304

Large Graph Models and Scalable Analytics

Spiros Papadimitriou, Research Scientist, Google

Relationships between various types of objects arise naturally in many applications, such as the web, social networks, business intelligence, information retrieval, and computer security, to mention a few.

Such data can be effectively modeled as graphs, with nodes of various types and potentially, edge annotations. Furthermore, the large volume of data motivates the need for scalable analytics that can answer key questions such as: (i) which are the most important nodes? or (ii) what are the key communities of nodes?

In this talk, we motivate and discuss various graph models and then present scalable analytics for three important classes of problems. First, we present a method that, given a bi-partite graph, can jointly discover communities of nodes, as well as the number of these communities. We demonstrate that our method produces meaningful patterns that agree with human intuition. We also that it can also scale up to very large graphs, using Hadoop/MapReduce. Second, we address complex, "multi-aspect" graphs, propose models based on tensors, and develop scalable methods for joint analysis of different aspects/modes. Finally, we present scalable measures of centrality, to determine important nodes in a graph, which can scale up to very large graphs using MapReduce.



------------------------------------------------------------------------

Spiros Papadimitriou has worked extensively on scalable models and analytics in several domains, including graphs, as well as time series, spatial, and streaming data. He has published more than forty five papers on these topics in refereed conferences and journals. He has three invited publications in best paper journal issues, several book chapters, and has filed multiple patents. He is also interested in mobile applications and has contributed to open source projects. He was a 2005 Siebel scholarship recipient and received the best paper award in SDM 2008. He has also been invited to give keynote talks on graph and social network analysis, and tutorials on time series stream mining and large-scale mining with Hadoop. He is currently a research scientist at Google. Prior to that, he was a research staff member at IBM T.J. Watson. He received his PhD in computer science from Carnegie Mellon University.