• Article
  • | |
  • Metrics
  • |
  • Reference [20]
  • |
  • Related [20]
  • |
  • Cited by [1]
  • | |
  • Comments
    Abstract:

    In many real-life applications, such as stock markets, network monitoring, and sensor networks, data are modeled as dynamic evolving time series which is continuous and unbounded in nature, and many such data streams concur usually. Clustering is useful in analyzing such paralleled data streams. This paper is interested in grouping these evolving data streams. For this purpose, a synopsis is maintained dynamically for each data stream. The construction of the synopsis is based on Discrete Wavelet Transform and utilizes the amnesic feature of data stream. By using the synopsis, a fast computation of approximate distances between streams and the cluster center can be implemented, and an efficient online version of the classical K-means clustering algorithm is developed. Experiments have proved the effectiveness of the proposed method.

    Reference
    [1] Keogh E, Kasetty S. On the need for time series data mining benchmarks: A survey and empirical demonstration. Data Mining and Knowledge Discovery, 2003,7(4):349-371. [doi: 10.1023/A:1024988512476]
    [2] Guha S, Meyerson A, Mishra N, Motwani R, O’Callaghan L. Clustering data streams: Theory and practice. IEEE Trans. on Knowledge and Data Engineering, 2003,15(3):515-528. [doi: 10.1109/TKDE.2003.1198387]
    [3] Aggarwal CC, Han J, Wang J, Yu PS. A framework for clustering evolving data streams. In: Johann CF, Peter CL, Serge A, Michael JC, Patricia GS, Andreas H, eds. Proc. of the 29th Int’l Conf on Very Large Data Base. San Francisco: Morgan Kaufmann Publishers, 2003. 81-92.
    [4] Charikar M, O’Callaghan L, Panigrahy R. Better streaming algorithms for clustering problems. In: Proc. of 35th ACM Symp. on Theory of Computing. New York: ACM Press, 2003. 30-39. http://doi.acm.org/10.1145/780542.780548
    [5] Beringer J, Hullermeier E. Online clustering of parallel data streams. Data & Knowledge Engineering, 2006,58(2):180-204. [doi: 10.1016/j.datak.2005.05.009]
    [6] Matias Y, Vitter JS, Wang M. Wavelet-Based histograms for selectivity estimation. In: Tiwary A, Franklin M, eds. Proc. of the 1998 ACM SIGMOD Int’l Conf. on Management of Data. New York: ACM Press, 1998. 448-459.
    [7] Boggess A, Narcowich FJ, Wrote; Rui GS, et al., Trans. A First Course in Wavelets with Fourier Analysis. Beijing: Publishing House of Electronics Industry, 2004 (in Chinese).
    [8] Gilbert AC, Kotidis Y, Muthukrishnan S, Strauss M. One-Pass wavelet decompositions of data streams. IEEE Trans. on Knowledge and Data Engineering, 2003,15(3):541-554. [doi: 10.1109/TKDE.2003.1198389]
    [9] Guha S, Kim C, Shim K. XWAVE: Approximate extended wavelets for streaming data. In: Nascimento MA, ?zsu MT, Kossmann D, Miller RJ, Blakeley JA, Schiefer KB, eds. Proc. of the 30th Int’l Conf. on Very Large Data Bases. Toronto: Morgan Kaufmann Publishers, 2004. 288-299.
    [10] Guha S, Harb B. Wavelet synopsis for data streams: Minimizing non-euclidean error. In: Grossman RL, Bayardo R, Bennett K, Vaidya J, eds. Proc. of the 11th ACM SIGKDD Int’l Conf. on Knowledge Discovery in Data Mining. New York: ACM Press, 2005. 88-97.
    [11] Karras P, Mamoulis N. One-Pass wavelet synopses for maximum-error metrics. In: B?hm K, Jensen CS, eds. Proc. of the 31st Int’l Conf. on Very Large Data Bases. Trondheim: VLDB Endowment, 2005. 421-432.
    [12] Palpanas T, Vlachos M, Keogh E, Gunopulos D, Truppel W. Online amnesic approximation of streaming time series. In: Proc. of the 20th Int’l Conf. on Data Engineering. Los Alamitos: IEEE Computer Society, 2004. 339-349. http://doi.ieeecomputersociety. org/10.1109/ICDE.2004.1320009
    [13] Zhao YC, Zhang SC. Generalized dimension-reduction framework for recent-biased time series analysis. IEEE Trans. on Knowledge and Data Engineering, 2006,18(2):231-244. [doi: 10.1109/TKDE.2006.30]
    [14] Bulut A, Singh A K. SWAT: Hierarchical stream summarization in large networks. In: Dayal U, Ramamritham K, Vijayaraman TM, eds. Proc. of the 19th Int’l Conf. on Data Engineering. Los Alamitos: IEEE Computer Society, 2003. 303-314.
    [15] Potamias M, Patroumpas K, Sellis T. Amnesic online synopses for moving objects. In: Proc. of the 15th ACM Int’l Conf. on Information and Knowledge Management. New York: ACM Press, 2006. 784-785. http://doi.acm.org/10.1145/1183614.1183729
    [16] Aggarwal CC, Han J, Wang J, Yu PS. On demand classification of data streams. In: Kohavi R, Gehrke J, DuMouchel W, Ghosh J, eds. Proc. of the 10th ACM SIGKDD Int’l Conf. on Knowledge Discovery in Data Mining. New York: ACM Press, 2004. 503-508.
    [17] Gavrilov M, Anguelov D, Indyk P, Motwani R. Mining the stock market: Which measure is best? In: Proc. of the 6th ACM SIGKDD Int’l Conf. on Knowledge Discovery in Data Mining. New York: ACM Press, 2000. 487-496. http://doi.acm.org/10.1145/ 347090.347189
    [18] Keogh E, Xi X, Wei L, Ratanamahatana CA. The UCR time series classification/Clustering homepage. 2007. http://www.cs.ucr. edu/~eamonn/time_series_data/
    [19] SFR-USD tickwise stock data set. 2001. http://www-psych.stanford.edu/~andreas/Time-Series/Data/
    附中文参考文献: [7] Boggess A,Narcowich FJ,著;芮国胜,等,译.小波与傅里叶分析基础.北京:电子工业出版社,2004.
    Comments
    Comments
    分享到微博
    Submit
Get Citation

陈华辉,施伯乐,钱江波,陈叶芳.基于小波概要的并行数据流聚类.软件学报,2010,21(4):644-658

Copy
Share
Article Metrics
  • Abstract:5193
  • PDF: 6637
  • HTML: 0
  • Cited by: 0
History
  • Received:May 15,2008
  • Revised:November 28,2008
You are the first2044074Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063