Abstract:This paper proposes a compression scheme which quickly compresses the raw data from multiple streams into a compressed synopsis. The synopsis allows to incrementally reconstruct the correlation coefficients without accessing the raw data. A modified k-means algorithm is developed to generate clustering results and dynamically adjust the number of clusters in real time so as to detect the evolving changes in the data streams.Finally, the framework is extended to support clustering on demand (COD), where a user can query for clustering results over an arbitrary time horizon. A theoretically sound time-segment partitioning scheme is developed so that any demand time horizon can be fulfilled by a combination of those time-segments. Experimental results on synthetic and real data sets show that the algorithm has higher clustering quality, speed and stability than other methods and can detect the evolving changes of the data streams in real time.