挖掘数据流任意滑动时间窗口内频繁模式

微信服务号

微信订阅号

2025年4月13日 7:28 星期日

首页 > 过刊浏览>2008年第19卷第10期 >2585-2596

挖掘数据流任意滑动时间窗口内频繁模式
DOI:
                        
                    
CSTR:
                        
                    
作者:
                        李国徽李国徽
华中科技大学 计算机科学与技术学院,湖北 武汉 430074
在期刊界中查找
在百度中查找
在本站中查找
陈 辉陈 辉
华中科技大学 计算机科学与技术学院,湖北 武汉 430074
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:Supported by the National Natural Science Foundation of China under Grant No.60873030 (国家自然科学基金); the National High-Tech Research and Development Plan of China under Grant No.2007AA01Z300 (国家高技术研究发展计划(863))

Mining the Frequent Patterns in an Arbitrary Sliding Window over Online Data Streams

Author:

LI Guo-Hui
LI Guo-Hui

在期刊界中查找
在百度中查找
在本站中查找
CHEN Hui
CHEN Hui

在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [14]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

由于数据流的流动性与连续性,数据流所蕴含的知识会随着时间的推移而发生变化.因此,在绝大多数数据流的应用中,用户往往对新产生的流数据所包含的知识要比对历史流数据所包含的知识感兴趣得多.提出了一种挖掘数据流任意大小滑动时间窗口内频繁模式的方法MSW(mining sliding window).当数据流流过时,该方法使用滑动窗口树SW-tree在单遍扫描流数据的条件下及时捕获数据流上最新的模式信息.同时,该方法还周期性地删除滑动窗口树上过期的及不频繁的模式分支,从而降低滑动窗口树的空间复杂度与维护代价.此外,该方法还应用时间衰减模型逐步降低历史事务模式支持数的权重,并由此来区分最近产生事务与历史事务的模式.大量仿真实验的结果表明,算法MSW具有较高的效率与优良的可扩展性,同时也优于其他同类算法.

关键词:数据流;频繁模式挖掘;滑动时间窗口;时间衰减模型

Abstract:

Because of the fluidity and continuity of data stream, the knowledge embedded in stream data is most likely to be changed as time goes by. Thus, in most data stream applications, people are more interested in the information of the recent transactions than that of the old. This paper proposes a method for mining the frequent patterns in an arbitrary sliding window of data streams. As data stream flows, the contents of the data stream are captured with a compact prefix-tree by scanning the stream only once. And the obsolete and infrequent items are deleted by periodically pruning the tree. To differentiate the patterns of recently generated transactions from those of historic transactions, a time decaying model is also applied. Extensive simulations are conducted and the experimental results show that the proposed method is efficient and scalable, and also superior to other analogous algorithms.

Key words:data stream; frequent pattern mining; sliding window; time decaying model

参考文献

[1] Gaber MM, Zaslavsky A, Krishnaswamy S. Mining data streams: A review. ACM SIGMOD Record, 2005,34(2):18-26.

[2] Jiang N, Gruenwald L. Research issues in data stream association rule mining. ACM SIGMOD Record, 2006,35(1):14-19.

[3] Garofalakis MN, Gehrke J. Querying and mining data streams: You only get one look a tutorial. In: Franklin MJ, Moon B, Ailamaki A, eds. Proc. of the 2002 ACM SIGMOD Int’l Conf. on Management of Data. Madison: ACM Press, 2002. 635-635.

[4] Giannella C, Han J, Pei J, Yan X, Yu PS. Mining frequent patterns in data streams at multiple time granularities. In: Data Mining: Next Generation Challenges and Future Directions. 2004. 191-212.

[5] Chang JH, Lee WS. Finding recent frequent itemsets adaptively over online data streams. In: Lise G, Ted ES, Pedro D, Christos F, eds. Proc. of the 9th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining. Washington: ACM Press, 2003. 487-492.

[6] Jiang N, Gruenwald L. CFI-Stream: Mining closed frequent itemsets in data streams. In: Roberto B, Kristin PB, Gautam D, Dimitrios G, Johannes G, eds. Proc. of the 12th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining. Philadelphia: ACM Press, 2006. 592-597.

[7] Yu JX, Chong Z, Lu H, Zhang Z, Zhou A. A false negative approach to mining frequent itemsets from high speed transactional data streams. Information Sciences, 2006,176(4):1986-2015.

[8] Leung CKS, Khan QI. DStree: A tree structure for the mining of frequent sets from data streams. In: Clifton CW, Zhong N, Liu JM, Wah BW, Wu XD, eds. Proc. of the 6th Int’l Conf. on Data Mining. Hong Kong: IEEE Press, 2006. 928-932.

[9] Wong RCW, Fu AWC. Mining top-k frequent itemsets from data streams. Data Mining and Knowledge Discovery, 2006,13(2): 193-217.

[10] Papadimitriou A, Yu PS. Optimal multi-scale patterns in time series streams. In: Roberto B, Kristin PB, Gautam D, Dimitrios G, Johannes G, eds. Proc. of the 2006 ACM SIGMOD Int’l Conf. of Management of Data. Chicago: ACM Press, 2006. 647-658.

[11] Arasu A, Manku GS. Approximate counts and quantiles over sliding windows. In: Weikum G, Koenig AC, Dessloch S, eds. Proc. of the 23rd ACM SIGMOD-SIGACT-SIGART Symp. on Principles of Database Systems. Paris: ACM Press, 2004. 286-296.

[12] Ho CC, Li HF, Kuo FF, Lee SY. Incremental mining of sequential patterns over a stream sliding window. In: Tsumto S, Clifton CW, Zhong N, Wu XD, Liu JM, Wah BW, Cheung YM, eds. Proc. of the 6th Int’l Conf. on Data Mining Workshops. Hong Kong: IEEE Press, 2006. 677-681.

[13] Lee L, Ting H. A simpler and more efficient deterministic scheme for finding frequent items over sliding windows. In: Roberto B, Kristin PB, Gautam D, Dimitrios G, Johannes G, eds. Proc. of the 25th ACM SIGMOD-SIGACT-SIGART Symp. on Principles of Database Systems. Chicago: ACM Press, 2006. 290-297.

[14] Han J, Pei J, Yin Y. Mining frequent patterns without candidate generation. In: Chen W, Naughton JF, Bernstein PA, eds. Proc. of the 2000 ACM SIGMOD Int’l Conf. on Management of Data. Dallas: ACM Press, 2000. 1-12.

引用本文

李国徽,陈辉.挖掘数据流任意滑动时间窗口内频繁模式.软件学报,2008,19(10):2585-2596

复制

文章指标

点击次数:4693
下载次数: 6336
HTML阅读次数: 0
引用次数: 0

历史

收稿日期:2007-11-08
最后修改日期:2008-01-08
录用日期:
在线发布日期:
出版日期:

微信服务号

微信订阅号

引用本文

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码