Wavelet Synopsis Based Clustering of Parallel Data Streams

微信服务号

微信订阅号

2025-5-13- 3

Home > Archive>Volume 21, Issue 4, 2010 >644-658

Wavelet Synopsis Based Clustering of Parallel Data Streams
DOI:
                        
                    
Author:
                        CHEN Hua-HuiCHEN Hua-Hui

Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
SHI Bai-LeSHI Bai-Le

Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
QIAN Jiang-BoQIAN Jiang-Bo

Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
CHEN Ye-FangCHEN Ye-Fang

Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference [20]

Related [20]

Cited by [1]

Materials

Comments

Abstract:

In many real-life applications, such as stock markets, network monitoring, and sensor networks, data are modeled as dynamic evolving time series which is continuous and unbounded in nature, and many such data streams concur usually. Clustering is useful in analyzing such paralleled data streams. This paper is interested in grouping these evolving data streams. For this purpose, a synopsis is maintained dynamically for each data stream. The construction of the synopsis is based on Discrete Wavelet Transform and utilizes the amnesic feature of data stream. By using the synopsis, a fast computation of approximate distances between streams and the cluster center can be implemented, and an efficient online version of the classical K-means clustering algorithm is developed. Experiments have proved the effectiveness of the proposed method.

Key words:clustering; synopsis; amnesic feature; discrete wavelet transform; data stream

Reference

[1] Keogh E, Kasetty S. On the need for time series data mining benchmarks: A survey and empirical demonstration. Data Mining and Knowledge Discovery, 2003,7(4):349-371. [doi: 10.1023/A:1024988512476]

[2] Guha S, Meyerson A, Mishra N, Motwani R, O’Callaghan L. Clustering data streams: Theory and practice. IEEE Trans. on Knowledge and Data Engineering, 2003,15(3):515-528. [doi: 10.1109/TKDE.2003.1198387]

[3] Aggarwal CC, Han J, Wang J, Yu PS. A framework for clustering evolving data streams. In: Johann CF, Peter CL, Serge A, Michael JC, Patricia GS, Andreas H, eds. Proc. of the 29th Int’l Conf on Very Large Data Base. San Francisco: Morgan Kaufmann Publishers, 2003. 81-92.

[4] Charikar M, O’Callaghan L, Panigrahy R. Better streaming algorithms for clustering problems. In: Proc. of 35th ACM Symp. on Theory of Computing. New York: ACM Press, 2003. 30-39. http://doi.acm.org/10.1145/780542.780548

[5] Beringer J, Hullermeier E. Online clustering of parallel data streams. Data & Knowledge Engineering, 2006,58(2):180-204. [doi: 10.1016/j.datak.2005.05.009]

[6] Matias Y, Vitter JS, Wang M. Wavelet-Based histograms for selectivity estimation. In: Tiwary A, Franklin M, eds. Proc. of the 1998 ACM SIGMOD Int’l Conf. on Management of Data. New York: ACM Press, 1998. 448-459.

[7] Boggess A, Narcowich FJ, Wrote; Rui GS, et al., Trans. A First Course in Wavelets with Fourier Analysis. Beijing: Publishing House of Electronics Industry, 2004 (in Chinese).

[8] Gilbert AC, Kotidis Y, Muthukrishnan S, Strauss M. One-Pass wavelet decompositions of data streams. IEEE Trans. on Knowledge and Data Engineering, 2003,15(3):541-554. [doi: 10.1109/TKDE.2003.1198389]

[9] Guha S, Kim C, Shim K. XWAVE: Approximate extended wavelets for streaming data. In: Nascimento MA, ?zsu MT, Kossmann D, Miller RJ, Blakeley JA, Schiefer KB, eds. Proc. of the 30th Int’l Conf. on Very Large Data Bases. Toronto: Morgan Kaufmann Publishers, 2004. 288-299.

[10] Guha S, Harb B. Wavelet synopsis for data streams: Minimizing non-euclidean error. In: Grossman RL, Bayardo R, Bennett K, Vaidya J, eds. Proc. of the 11th ACM SIGKDD Int’l Conf. on Knowledge Discovery in Data Mining. New York: ACM Press, 2005. 88-97.

[11] Karras P, Mamoulis N. One-Pass wavelet synopses for maximum-error metrics. In: B?hm K, Jensen CS, eds. Proc. of the 31st Int’l Conf. on Very Large Data Bases. Trondheim: VLDB Endowment, 2005. 421-432.

[12] Palpanas T, Vlachos M, Keogh E, Gunopulos D, Truppel W. Online amnesic approximation of streaming time series. In: Proc. of the 20th Int’l Conf. on Data Engineering. Los Alamitos: IEEE Computer Society, 2004. 339-349. http://doi.ieeecomputersociety. org/10.1109/ICDE.2004.1320009

[13] Zhao YC, Zhang SC. Generalized dimension-reduction framework for recent-biased time series analysis. IEEE Trans. on Knowledge and Data Engineering, 2006,18(2):231-244. [doi: 10.1109/TKDE.2006.30]

[14] Bulut A, Singh A K. SWAT: Hierarchical stream summarization in large networks. In: Dayal U, Ramamritham K, Vijayaraman TM, eds. Proc. of the 19th Int’l Conf. on Data Engineering. Los Alamitos: IEEE Computer Society, 2003. 303-314.

[15] Potamias M, Patroumpas K, Sellis T. Amnesic online synopses for moving objects. In: Proc. of the 15th ACM Int’l Conf. on Information and Knowledge Management. New York: ACM Press, 2006. 784-785. http://doi.acm.org/10.1145/1183614.1183729

[16] Aggarwal CC, Han J, Wang J, Yu PS. On demand classification of data streams. In: Kohavi R, Gehrke J, DuMouchel W, Ghosh J, eds. Proc. of the 10th ACM SIGKDD Int’l Conf. on Knowledge Discovery in Data Mining. New York: ACM Press, 2004. 503-508.

[17] Gavrilov M, Anguelov D, Indyk P, Motwani R. Mining the stock market: Which measure is best? In: Proc. of the 6th ACM SIGKDD Int’l Conf. on Knowledge Discovery in Data Mining. New York: ACM Press, 2000. 487-496. http://doi.acm.org/10.1145/ 347090.347189

[18] Keogh E, Xi X, Wei L, Ratanamahatana CA. The UCR time series classification/Clustering homepage. 2007. http://www.cs.ucr. edu/～eamonn/time_series_data/

[19] SFR-USD tickwise stock data set. 2001. http://www-psych.stanford.edu/～andreas/Time-Series/Data/

附中文参考文献: [7] Boggess A,Narcowich FJ,著;芮国胜,等,译.小波与傅里叶分析基础.北京:电子工业出版社,2004.

Get Citation

陈华辉,施伯乐,钱江波,陈叶芳.基于小波概要的并行数据流聚类.软件学报,2010,21(4):644-658

Copy

Article Metrics

Abstract:5193
PDF: 6637
HTML: 0
Cited by: 0

History

Received:May 15,2008
Revised:November 28,2008
Adopted:
Online:
Published:

You are the first2044074Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History