An Efficient Algorithm for Mining Approximate Frequent Item over Data Streams

微信服务号

微信订阅号

2025-4-6- 17

Home > Archive>Volume 18, Issue 4, 2007 >884-892

An Efficient Algorithm for Mining Approximate Frequent Item over Data Streams
DOI:
                        
                    
Author:
                        WANG Wei-PingWANG Wei-Ping

Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
LI Jian-ZhongLI Jian-Zhong

Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
ZHANG Dong-DongZHANG Dong-Dong

Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
GUO Long-JiangGUO Long-Jiang

Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference [13]

Related [20]

Cited by [17]

Materials

Comments

Abstract:

A frequent item of a data stream is a data point whose occurrence frequency is above a given threshold. Finding frequent item of data stream has wide applications in various fields, such as network traffic monitor, data stream OLAP and data stream mining, etc. In data stream model, the algorithm can only scan the data in one pass and the available memory space is very limited relative to the volume of a data stream, therefore it is usually unable to find all the accurate frequent items of a data stream. This paper proposes a novel algorithm to find ε-approximate frequent items of a data stream, its space complexity is O(ε^-1) and the processing time for each item is O(1) in average. Moreover, the frequency error bound of the results returned by the proposed algorithm is ε(1-s+ε)N. Among all the existed approaches, this method is the best.

Key words:data stream;data mining;frequent item;ε-approximate

Reference

[1]Babcock AK,Babu S,Datar M.Model and issues in data stream systems.In:Popa L,ed.Proc.of the 21st ACM SIGACT-SIGMOD-SIGART Symp.on Principles of Database Systems.Madison:ACM,2002.1-16.

[2]Fang M,Shivakumar N,Garcia-Molina H,Motwani R,Ullman J.Computing iceberg queries eefficiently.In:Gupta A,Shmueli O,Widom J,eds.Proc.of the 24th Int'l Conf.on Very Large Data Bases.New York:Morgan Kaufmann Publishers,1998.299-310.

[3]Agrawal R,Srikant R.Fast algorithms for mining association rules.In:Bocca JB,Jarke M,Zaniolo C,eds.Proc.of the 20th Int'l Conf.on Very Large Data Bases.Santiago:Morgan Kaufmann Publishers,1994.487-499.

[4]Estan C,Verghese G.New directions in traffic measurement and accounting:Focusing on the elephants,ignoring the mice.ACM Trans.on Computer Systems,2003,21(3):270-313.

[5]Charikar M,Chen K,Farach-Colton M.Finding frequent items in data streams.In:Widmayer P,Ruiz FT,Bueno RM,Hennessy M,Eidenbenz S,Conejo R,eds.Proc.of the Int'l Colloquium on Automata,Languages and Programming.Malaga:Springer-Verlag,2002.693-703.

[6]Cormode G,Muthukrishnan S.What's hot and what's not:Tracking most frequent items dynamically.In:Halevy AY,Ives ZG,Doan AH,eds.Proc.of the 22nd ACM SIGACT-SIGMOD-SIGART Symp.on Principles of Database Systems.San Diego:ACM Press,2003.296-306.

[7]Jin C,Qian W,Sha C,Yu JX,Zhou A.Dynamically maintaining frequent items over a data stream.In:Carbonell J,ed.Proc.of the 2003 ACM CIKM Int'l Conf.on Information and Knowledge Management.New Orleans:ACM Press,2003.287-294.

[8]Manku GS,Motwani R.Approximate frequency counts over data streams.In:Bernstein P,Ioannidis Y,Ramakrishnan R,eds.Proc.of the 28th Int'l Conf.on Very Large Data Bases.Hong Kong:Morgan Kaufmann Publishers,2002.346-357.

[9]Karp R,Papadimitriou C,Shenker S.A simple algorithm for finding frequent elements in sets and bags.Trans.on Database Systems,2003,28(1):51-55.

[10]Demaine E,López-Ortiz A,Munro JI.Frequency estimation of Internet packet streams with limited space.In:M(o)hring RH,Raman R,eds.Algorithms.ESA 2002,Proc.of the 10th Annual European Symp.Rome:Springer-Verlag,2002.348-360.

[11]Graham C,Flip K,Muthukrishnan S,Divesh S.Finding hierarchical heavy hitters over data sreams.In:Freytag JC,Lockemann PC,Abiteboul S,Carey MJ,Selinger PG,Heuer A,eds.Proc.of the 29th Int'l Conf.on Very Large Data Bases.Berlin:Morgan Kaufmann Publishers,2003.464-475.

[12]Metwally A,Agrawal D,Abbadi AE.Efficient computation of frequent and top-k elements in data streams.In:Eiter T,Libkin L,eds.Proc.of the Int'l Conf.on Data Theory.Edinburgh:Springer-Verlag,2005.398-412.

[13]http://ita.ee.lbl.gov/html/contrib/WorldCup.html.2004.

Get Citation

王伟平,李建中,张冬冬,郭龙江.一种有效的挖掘数据流近似频繁项算法.软件学报,2007,18(4):884-892

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:April 14,2006
Revised:May 16,2006
Adopted:
Online:
Published:

You are the first2033371Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History