Finding Frequent Items in Sliding Windows with Multinomially-Distributed Item Frequencies, Lukasz Golab, David DeHaan, A. López-Ortiz and Erik Demaine. In Proceedings of 16th International Conference on Scientific and Statistical Database Management (SSDBM), IEEE Computer Society, pp. 425-426, 2004. [PDF file]
Abstract.
In this paper, we present an algorithm for identifying frequently occurring items within a sliding window of the last N items seen over an infinite data stream, given the following constraints.· The relative frequencies of the item types can vary over the lifetime of the stream, provided that they vary sufficiently slowly that for any sliding window of N tuples, with high probability the window could have been generated by a multinomial distribution. We refer to this as the drifting distribution model in the full version of this paper.
· The entire sliding window does not fit in the available memory (otherwise, we could simply count all the distinct item types and return those whose frequencies exceed some threshold).
· The stream may arrive at a high rate, so only a constant number of operations (amortized) is allowed for the processing of each item