Computer Science Seminar

2012 Mar 05 at 10:30

DC 1304

Large-scale Graph Analytics: Patterns, Anomalies, and Tools

Leman, Akoglu, Dept. Comp. Sci., Carnegie Mellon University

Given large collections of data, how can we extract useful knowledge from it? How can we find regularities, anomalies, and events efficiently? With the advance of science and technology, we are witnessing an explosion in both the amount and complexity of data, constantly being generated. Today, interpreting this huge volume of data and extracting descriptive, predictive, and commercially or medically useful information from it is a big challenge.

In this talk, I will describe how we exploit graph mining techniques for large data analytics with applications on diverse real graphs ranging from social and information graphs like Twitter and YouTube to communication graphs. Our contributions follow three main tracks: (1) graph pattern mining and generators, focusing on identifying regularities in the formation and evolution of real graphs, and building generative models that can mimic the properties that real graphs exhibit; (2) graph anomaly detection, exploiting the discovered patterns as well as various compression techniques to spot irregularities and discontinuities in complex graphs that grow over time; and (3) scalable graph mining algorithms, focusing on developing a breadth of fast algorithms and tools that provide the means for massive graph analytics, including anomaly detection, nearest-neighbor search, community detection, as well as visualization and sensemaking.



Bio

Leman Akoglu is a Ph.D. candidate in the Computer Science Department at Carnegie Mellon University, advised by Christos Faloutsos. She received her B.S. at Bilkent University in 2007. She won 2 best paper awards and published 14 refereed articles in major data mining venues. She is one of the inventors of 3 U.S. patents, filed by IBM T. J. Watson Research Labs. Her research focuses on large-scale data analytics, with an emphasis on anomaly and event detection in large, time-varying graphs using scalable algorithms and tools.