Abstract:This paper characterizes the local and global hotspots in text streams and elaborates their correlation. The paper then applies Kolmogorov complexity to mining the hotspots in multiple text streams. The Redundant Information is defined based on Kolmogorov complexity, and it has been demonstrated that the Redundant Information exceeding a threshold is necessary for the local hotspots. Secondly, a similarity metric, termed as Stream Information Distance (SID), is suggested based on the conditional Kolmogorov complexity to quantify the similarity between different text streams. Borrowing ideas of Phylogeny originated from Computational Biology, a heuristic algorithm based on hierarchical clustering is proposed to mine the global hostspots from multiple text streams. Finally, the convergency, effectiveness, and scalability of this algorithm are validated by the extensive experiments over synthetic and real data set.