Database Research Group PhD Thesis Defence

2010 Oct 19 at 13:30

DC 3323

Mining Time-Changing Data Streams

Yingying Tao, PhD candidat, David R. Cheriton School of Comp. Sci., Univ. Waterloo

Streaming data have gained considerable attention in database and data mining communities because of the emergence of a class of applications, such as financial marketing, sensor networks, internet IP monitoring, and telecommunications that produce these data. Data streams have some unique characteristics that are not exhibited by traditional data: unbounded, fast-arriving, and time-changing. Traditional data mining techniques that make multiple passes over data or that ignore distribution changes are not applicable to dynamic data streams. Mining data streams has been an active research area to address requirements of the streaming applications. This thesis focuses on developing techniques for distribution change detection and mining time-changing data streams. Two techniques are proposed that can detect special types of distribution changes, the periodic changes and mean and standard deviation changes, in generic data streams. Approaches for tackling the most popular stream mining tasks, clustering and frequent itemset mining, are also presented in this thesis. All the proposed techniques are implemented and empirically studied on both synthetic and real-world data streams. Experimental results show that the proposed techniques can achieve promising performance for detecting changes and mining dynamic data streams.