Mining the Frequent Patterns in an Arbitrary Sliding Window over Online Data Streams

微信服务号

微信订阅号

2025-4-6- 15

Home > Archive>Volume 19, Issue 10, 2008 >2585-2596

Mining the Frequent Patterns in an Arbitrary Sliding Window over Online Data Streams
DOI:
                        
                    
Author:
                        LI Guo-HuiLI Guo-Hui

Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
CHEN HuiCHEN Hui

Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference [14]

Related [20]

Cited by

Materials

Comments

Abstract:

Because of the fluidity and continuity of data stream, the knowledge embedded in stream data is most likely to be changed as time goes by. Thus, in most data stream applications, people are more interested in the information of the recent transactions than that of the old. This paper proposes a method for mining the frequent patterns in an arbitrary sliding window of data streams. As data stream flows, the contents of the data stream are captured with a compact prefix-tree by scanning the stream only once. And the obsolete and infrequent items are deleted by periodically pruning the tree. To differentiate the patterns of recently generated transactions from those of historic transactions, a time decaying model is also applied. Extensive simulations are conducted and the experimental results show that the proposed method is efficient and scalable, and also superior to other analogous algorithms.

Key words:data stream; frequent pattern mining; sliding window; time decaying model

Reference

[1] Gaber MM, Zaslavsky A, Krishnaswamy S. Mining data streams: A review. ACM SIGMOD Record, 2005,34(2):18-26.

[2] Jiang N, Gruenwald L. Research issues in data stream association rule mining. ACM SIGMOD Record, 2006,35(1):14-19.

[3] Garofalakis MN, Gehrke J. Querying and mining data streams: You only get one look a tutorial. In: Franklin MJ, Moon B, Ailamaki A, eds. Proc. of the 2002 ACM SIGMOD Int’l Conf. on Management of Data. Madison: ACM Press, 2002. 635-635.

[4] Giannella C, Han J, Pei J, Yan X, Yu PS. Mining frequent patterns in data streams at multiple time granularities. In: Data Mining: Next Generation Challenges and Future Directions. 2004. 191-212.

[5] Chang JH, Lee WS. Finding recent frequent itemsets adaptively over online data streams. In: Lise G, Ted ES, Pedro D, Christos F, eds. Proc. of the 9th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining. Washington: ACM Press, 2003. 487-492.

[6] Jiang N, Gruenwald L. CFI-Stream: Mining closed frequent itemsets in data streams. In: Roberto B, Kristin PB, Gautam D, Dimitrios G, Johannes G, eds. Proc. of the 12th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining. Philadelphia: ACM Press, 2006. 592-597.

[7] Yu JX, Chong Z, Lu H, Zhang Z, Zhou A. A false negative approach to mining frequent itemsets from high speed transactional data streams. Information Sciences, 2006,176(4):1986-2015.

[8] Leung CKS, Khan QI. DStree: A tree structure for the mining of frequent sets from data streams. In: Clifton CW, Zhong N, Liu JM, Wah BW, Wu XD, eds. Proc. of the 6th Int’l Conf. on Data Mining. Hong Kong: IEEE Press, 2006. 928-932.

[9] Wong RCW, Fu AWC. Mining top-k frequent itemsets from data streams. Data Mining and Knowledge Discovery, 2006,13(2): 193-217.

[10] Papadimitriou A, Yu PS. Optimal multi-scale patterns in time series streams. In: Roberto B, Kristin PB, Gautam D, Dimitrios G, Johannes G, eds. Proc. of the 2006 ACM SIGMOD Int’l Conf. of Management of Data. Chicago: ACM Press, 2006. 647-658.

[11] Arasu A, Manku GS. Approximate counts and quantiles over sliding windows. In: Weikum G, Koenig AC, Dessloch S, eds. Proc. of the 23rd ACM SIGMOD-SIGACT-SIGART Symp. on Principles of Database Systems. Paris: ACM Press, 2004. 286-296.

[12] Ho CC, Li HF, Kuo FF, Lee SY. Incremental mining of sequential patterns over a stream sliding window. In: Tsumto S, Clifton CW, Zhong N, Wu XD, Liu JM, Wah BW, Cheung YM, eds. Proc. of the 6th Int’l Conf. on Data Mining Workshops. Hong Kong: IEEE Press, 2006. 677-681.

[13] Lee L, Ting H. A simpler and more efficient deterministic scheme for finding frequent items over sliding windows. In: Roberto B, Kristin PB, Gautam D, Dimitrios G, Johannes G, eds. Proc. of the 25th ACM SIGMOD-SIGACT-SIGART Symp. on Principles of Database Systems. Chicago: ACM Press, 2006. 290-297.

[14] Han J, Pei J, Yin Y. Mining frequent patterns without candidate generation. In: Chen W, Naughton JF, Bernstein PA, eds. Proc. of the 2000 ACM SIGMOD Int’l Conf. on Management of Data. Dallas: ACM Press, 2000. 1-12.

Get Citation

李国徽,陈辉.挖掘数据流任意滑动时间窗口内频繁模式.软件学报,2008,19(10):2585-2596

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:November 08,2007
Revised:January 08,2008
Adopted:
Online:
Published:

You are the first2033357Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History