Mining the Frequent Patterns in an Arbitrary Sliding Window over Online Data Streams
Affiliation:

  • Article
  • | |
  • Metrics
  • |
  • Reference [14]
  • |
  • Related [20]
  • |
  • Cited by [15]
  • | |
  • Comments
    Abstract:

    Because of the fluidity and continuity of data stream, the knowledge embedded in stream data is most likely to be changed as time goes by. Thus, in most data stream applications, people are more interested in the information of the recent transactions than that of the old. This paper proposes a method for mining the frequent patterns in an arbitrary sliding window of data streams. As data stream flows, the contents of the data stream are captured with a compact prefix-tree by scanning the stream only once. And the obsolete and infrequent items are deleted by periodically pruning the tree. To differentiate the patterns of recently generated transactions from those of historic transactions, a time decaying model is also applied. Extensive simulations are conducted and the experimental results show that the proposed method is efficient and scalable, and also superior to other analogous algorithms.

    Reference
    [1] Gaber MM, Zaslavsky A, Krishnaswamy S. Mining data streams: A review. ACM SIGMOD Record, 2005,34(2):18-26.
    [2] Jiang N, Gruenwald L. Research issues in data stream association rule mining. ACM SIGMOD Record, 2006,35(1):14-19.
    [3] Garofalakis MN, Gehrke J. Querying and mining data streams: You only get one look a tutorial. In: Franklin MJ, Moon B, Ailamaki A, eds. Proc. of the 2002 ACM SIGMOD Int’l Conf. on Management of Data. Madison: ACM Press, 2002. 635-635.
    [4] Giannella C, Han J, Pei J, Yan X, Yu PS. Mining frequent patterns in data streams at multiple time granularities. In: Data Mining: Next Generation Challenges and Future Directions. 2004. 191-212.
    [5] Chang JH, Lee WS. Finding recent frequent itemsets adaptively over online data streams. In: Lise G, Ted ES, Pedro D, Christos F, eds. Proc. of the 9th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining. Washington: ACM Press, 2003. 487-492.
    [6] Jiang N, Gruenwald L. CFI-Stream: Mining closed frequent itemsets in data streams. In: Roberto B, Kristin PB, Gautam D, Dimitrios G, Johannes G, eds. Proc. of the 12th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining. Philadelphia: ACM Press, 2006. 592-597.
    [7] Yu JX, Chong Z, Lu H, Zhang Z, Zhou A. A false negative approach to mining frequent itemsets from high speed transactional data streams. Information Sciences, 2006,176(4):1986-2015.
    [8] Leung CKS, Khan QI. DStree: A tree structure for the mining of frequent sets from data streams. In: Clifton CW, Zhong N, Liu JM, Wah BW, Wu XD, eds. Proc. of the 6th Int’l Conf. on Data Mining. Hong Kong: IEEE Press, 2006. 928-932.
    [9] Wong RCW, Fu AWC. Mining top-k frequent itemsets from data streams. Data Mining and Knowledge Discovery, 2006,13(2): 193-217.
    [10] Papadimitriou A, Yu PS. Optimal multi-scale patterns in time series streams. In: Roberto B, Kristin PB, Gautam D, Dimitrios G, Johannes G, eds. Proc. of the 2006 ACM SIGMOD Int’l Conf. of Management of Data. Chicago: ACM Press, 2006. 647-658.
    [11] Arasu A, Manku GS. Approximate counts and quantiles over sliding windows. In: Weikum G, Koenig AC, Dessloch S, eds. Proc. of the 23rd ACM SIGMOD-SIGACT-SIGART Symp. on Principles of Database Systems. Paris: ACM Press, 2004. 286-296.
    [12] Ho CC, Li HF, Kuo FF, Lee SY. Incremental mining of sequential patterns over a stream sliding window. In: Tsumto S, Clifton CW, Zhong N, Wu XD, Liu JM, Wah BW, Cheung YM, eds. Proc. of the 6th Int’l Conf. on Data Mining Workshops. Hong Kong: IEEE Press, 2006. 677-681.
    [13] Lee L, Ting H. A simpler and more efficient deterministic scheme for finding frequent items over sliding windows. In: Roberto B, Kristin PB, Gautam D, Dimitrios G, Johannes G, eds. Proc. of the 25th ACM SIGMOD-SIGACT-SIGART Symp. on Principles of Database Systems. Chicago: ACM Press, 2006. 290-297.
    [14] Han J, Pei J, Yin Y. Mining frequent patterns without candidate generation. In: Chen W, Naughton JF, Bernstein PA, eds. Proc. of the 2000 ACM SIGMOD Int’l Conf. on Management of Data. Dallas: ACM Press, 2000. 1-12.
    Comments
    Comments
    分享到微博
    Submit
Get Citation

李国徽,陈 辉.挖掘数据流任意滑动时间窗口内频繁模式.软件学报,2008,19(10):2585-2596

Copy
Share
Article Metrics
  • Abstract:4686
  • PDF: 6321
  • HTML: 0
  • Cited by: 0
History
  • Received:November 08,2007
  • Revised:January 08,2008
You are the first2032799Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063