• Article
  • | |
  • Metrics
  • |
  • Reference [13]
  • |
  • Related [20]
  • |
  • Cited by [17]
  • | |
  • Comments
    Abstract:

    A frequent item of a data stream is a data point whose occurrence frequency is above a given threshold. Finding frequent item of data stream has wide applications in various fields, such as network traffic monitor, data stream OLAP and data stream mining, etc. In data stream model, the algorithm can only scan the data in one pass and the available memory space is very limited relative to the volume of a data stream, therefore it is usually unable to find all the accurate frequent items of a data stream. This paper proposes a novel algorithm to find ε-approximate frequent items of a data stream, its space complexity is O(ε-1) and the processing time for each item is O(1) in average. Moreover, the frequency error bound of the results returned by the proposed algorithm is ε(1-s+ε)N. Among all the existed approaches, this method is the best.

    Reference
    [1]Babcock AK,Babu S,Datar M.Model and issues in data stream systems.In:Popa L,ed.Proc.of the 21st ACM SIGACT-SIGMOD-SIGART Symp.on Principles of Database Systems.Madison:ACM,2002.1-16.
    [2]Fang M,Shivakumar N,Garcia-Molina H,Motwani R,Ullman J.Computing iceberg queries eefficiently.In:Gupta A,Shmueli O,Widom J,eds.Proc.of the 24th Int'l Conf.on Very Large Data Bases.New York:Morgan Kaufmann Publishers,1998.299-310.
    [3]Agrawal R,Srikant R.Fast algorithms for mining association rules.In:Bocca JB,Jarke M,Zaniolo C,eds.Proc.of the 20th Int'l Conf.on Very Large Data Bases.Santiago:Morgan Kaufmann Publishers,1994.487-499.
    [4]Estan C,Verghese G.New directions in traffic measurement and accounting:Focusing on the elephants,ignoring the mice.ACM Trans.on Computer Systems,2003,21(3):270-313.
    [5]Charikar M,Chen K,Farach-Colton M.Finding frequent items in data streams.In:Widmayer P,Ruiz FT,Bueno RM,Hennessy M,Eidenbenz S,Conejo R,eds.Proc.of the Int'l Colloquium on Automata,Languages and Programming.Malaga:Springer-Verlag,2002.693-703.
    [6]Cormode G,Muthukrishnan S.What's hot and what's not:Tracking most frequent items dynamically.In:Halevy AY,Ives ZG,Doan AH,eds.Proc.of the 22nd ACM SIGACT-SIGMOD-SIGART Symp.on Principles of Database Systems.San Diego:ACM Press,2003.296-306.
    [7]Jin C,Qian W,Sha C,Yu JX,Zhou A.Dynamically maintaining frequent items over a data stream.In:Carbonell J,ed.Proc.of the 2003 ACM CIKM Int'l Conf.on Information and Knowledge Management.New Orleans:ACM Press,2003.287-294.
    [8]Manku GS,Motwani R.Approximate frequency counts over data streams.In:Bernstein P,Ioannidis Y,Ramakrishnan R,eds.Proc.of the 28th Int'l Conf.on Very Large Data Bases.Hong Kong:Morgan Kaufmann Publishers,2002.346-357.
    [9]Karp R,Papadimitriou C,Shenker S.A simple algorithm for finding frequent elements in sets and bags.Trans.on Database Systems,2003,28(1):51-55.
    [10]Demaine E,López-Ortiz A,Munro JI.Frequency estimation of Internet packet streams with limited space.In:M(o)hring RH,Raman R,eds.Algorithms.ESA 2002,Proc.of the 10th Annual European Symp.Rome:Springer-Verlag,2002.348-360.
    [11]Graham C,Flip K,Muthukrishnan S,Divesh S.Finding hierarchical heavy hitters over data sreams.In:Freytag JC,Lockemann PC,Abiteboul S,Carey MJ,Selinger PG,Heuer A,eds.Proc.of the 29th Int'l Conf.on Very Large Data Bases.Berlin:Morgan Kaufmann Publishers,2003.464-475.
    [12]Metwally A,Agrawal D,Abbadi AE.Efficient computation of frequent and top-k elements in data streams.In:Eiter T,Libkin L,eds.Proc.of the Int'l Conf.on Data Theory.Edinburgh:Springer-Verlag,2005.398-412.
    [13]http://ita.ee.lbl.gov/html/contrib/WorldCup.html.2004.
    Comments
    Comments
    分享到微博
    Submit
Get Citation

王伟平,李建中,张冬冬,郭龙江.一种有效的挖掘数据流近似频繁项算法.软件学报,2007,18(4):884-892

Copy
Share
Article Metrics
  • Abstract:5191
  • PDF: 7112
  • HTML: 0
  • Cited by: 0
History
  • Received:April 14,2006
  • Revised:May 16,2006
You are the first2032699Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063