Finding Frequent Items in Sliding Windows with Multinomially-Distributed Item Frequencies, Lukasz Golab, David DeHaan, A. López-Ortiz and Erik Demaine. In Proceedings of 16th International Conference on Scientific and Statistical Database Management (SSDBM), IEEE Computer Society, pp. 425-426, 2004. [PDF file]

Abstract.
In this paper, we present an algorithm for identifying frequently occurring items within a sliding window of the last N items seen over an infinite data stream, given the following constraints.

·        The relative frequencies of the item types can vary over the lifetime of the stream, provided that they vary sufficiently slowly that for any sliding window of N tuples, with high probability the window could have been generated by a multinomial distribution. We refer to this as the drifting distribution model in the full version of this paper.

·        The entire sliding window does not fit in the available memory (otherwise, we could simply count all the distinct item types and return those whose frequencies exceed some threshold).

·        The stream may arrive at a high rate, so only a constant number of operations (amortized) is allowed for the processing of each item