Data Stream Clustering Based on Grid Coupling
Author:
Affiliation:

Fund Project:

National Natural Science Foundation of China (61762090, 61262069, 61472346, 61662086); Natural Science Foundation of Yunnan Province (2016FA026, 2015FB114); Project of Innovative Research Team of Yunnan Province; Program for Innovation Research Team (in Science and Technology) in University of Yunnan Province (IRTSTYN)

  • Article
  • | |
  • Metrics
  • |
  • Reference [22]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    As more and more applications generate data streams, the research on data stream clustering analysis has received extensive attention. Grid-based clustering maps data streams into grid structures to form data summaries, and then clusters data summaries. This method usually has high efficiency, but each grid is processed independently, and the interaction between the grids is not considered, so the clustering quality needs to be improved. In this study, the coupling relationship between grids is considered rather than processed independently in the clustering process, and an algorithm for clustering data stream based on grid coupling is proposed. The proposed approach improves the quality of clusters as the coupling of the grid more accurately captures the correlation amongst the data. Experimental evaluations on synthetic and real data streams illustrate the superiority of the proposed approach compared with the state-of-the-arts approaches.

    Reference
    [1] Isaksson C, Dunham MH, Hahsler M. SOStream:Self organizing density-based clustering over data stream. In:Proc. of the Machine Learning and Data Mining in Pattern Recognition. Berlin, Heidelberg:Springer-Verlag, 2012. 264-278.[doi:10.1007/978-3-642-31537-4_21]
    [2] Silva JA, Faria ER, Barros RC, et al. Data stream clustering:A survey. ACM Computing Surveys, 2013,46(1):1-31.
    [3] Zhang X, Furtlehner C, Germain-Renaud C, Sebag M. Data stream clustering with affinity propagation. IEEE Trans. on Knowledge and Data Engineering, 2014,26(7):1644-1656.[doi:10.1109/TKDE.2013.146]
    [4] Gong SF, Zhang YF, Yu G. Clustering stream data by exploring the evolution of density mountain. Proc. of the VLDB Endowment, 2017,11(4):393-405.[doi:10.1145/3164135.3164136]
    [5] Gama J, Žliobaitė I, Bifet A, Pechenizkiy M, Bouchachia A. A survey on concept drift adaptation. ACM Computing Surveys, 2014, 46(4):1-37.[doi:10.1145/2523813]
    [6] Masud M, Gao J, Khan L, Han J, Thuraisingham BM. Classification and novel class detection in concept-drifting data streams under time constraints. IEEE Trans. on Knowledge and Data Engineering, 2011,23(6):859-874.[doi:10.1109/TKDE.2010.61]
    [7] Aggarwal CC, Han J, Wang J, Yu PS. A framework for clustering evolving data streams. In:Proc. of the 29th Very Large Data Bases (VLDB) Conf. Berlin:VLDB Endowment. 2003. 81-92.[doi:10.1016/B978-012722442-8/50016-1]
    [8] Chen Y, Tu L. Density-based clustering for real-time stream data. In:Proc. of the 13th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. New York:ACM Press, 2007. 133-142.[doi:10.1145/1281192.1281210]
    [9] Amini A, Saboohi H, Herawan T, Wah TY. MuDi-stream:A multi density clustering algorithm for evolving data stream. Journal of Network and Computer Applications, 2016,59(1):370-385.[doi:10.1016/j.jnca.2014.11.007]
    [10] Tu L, Chen Y. Stream data clustering based on grid density and attraction. ACM Trans. on Knowledge Discovery from Data, 2009, 3(3):1-27.[doi:10.1145/1552303.1552305]
    [11] Wan L, Ng WK, Dang XH, Yu PS, Zhang K. Density-based clustering of data streams at multiple resolutions. ACM Trans. on Knowledge Discovery from Data, 2009,3(3):1-28.[doi:10.1145/1552303.1552307]
    [12] Hahsler M, Bolaños M. Clustering data streams based on shared density between micro-clusters. IEEE Trans. on Knowledge and Data Engineering, 2016,28(6):1449-1461.[doi:10.1109/TKDE.2016.2522412]
    [13] Nguyen HL, Woon YK, Ng WK. A survey on data stream clustering and classification. Knowledge & Information Systems, 2015, 45(3):535-569.[doi:10.1007/s10115-014-0808-1]
    [14] O'callaghan L, Mishra N, Meyerson A, Guha S, Motwani R. Streaming-data algorithms for high-quality clustering. In:Proc. of the ICDE. 2002. 685-694.[doi:10.1109/ICDE.2002.994785]
    [15] Aggarwal CC, Han J, Wang J, Yu PS. A framework for projected clustering of high dimensional data streams. Proc. of the VLDB Endowment, 2004. 852-863.[doi:10.1016/B978-012088469-8.50075-9]
    [16] Cao F, Estert M, Qian W, Zhou A. Density-based clustering over an evolving data stream with noise. In:Proc. of the Siam Int'l Conf. on Data Mining. Bethesda, 2006. 328-339.[doi:10.1137/1.9781611972764.29]
    [17] Stolfo J, Fan W, Lee W, Prodromidis A, Chan PK. Cost-based modeling and evaluation for data mining with application to fraud and intrusion detection. In:Proc. of the Results from the JAM Project by Salvatore. 2000. 1-15.
    [18] Reiss A, Stricker D. Introducing a new benchmarked dataset for activity monitoring. In:Proc. of the Int'l Symp. on Wearable Computers. IEEE Computer Society, 2012. 108-109.[doi:10.1109/ISWC.2012.13]
    [19] Reiss A, Stricker D. Creating and benchmarking a new dataset for physical activity monitoring. In:Proc. of the Workshop on Affect & Behaviour Related Assistance. 2012. 1-8.[doi:10.1145/2413097.2413148]
    [20] Bifet A, Holmes G, Kirkby R, Pfahringer B. MOA:Massive online analysis. Journal of Machine Learning Research, 2010,11(2):1601-1604.
    [21] Kranen P, Kremer H, Jansen T, Seidl T, Bifet A, Holmes G, Pfahringer B. Clustering performance on evolving data streams:Assessing algorithms and evaluation measures within MOA. In:Proc. of the Int'l Conf. on Data Mining Workshops. 2010. 1400-1403.[doi:10.1109/ICDMW.2010.17]
    [22] Kremer H, Kranen P, Jansen T, Seidl T, Bifet A, Holmes G, Pfahringer B. An effective evaluation measure for clustering on evolving data streams. In:Proc. of the SIGKDD. San Diego, 2011. 868-876.[doi:10.1145/2020408.2020555]
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

张东月,周丽华,吴湘云,赵丽红.基于网格耦合的数据流聚类.软件学报,2019,30(3):667-683

Copy
Share
Article Metrics
  • Abstract:3255
  • PDF: 5899
  • HTML: 3251
  • Cited by: 0
History
  • Received:July 20,2018
  • Revised:September 20,2018
  • Online: March 06,2019
You are the first2032375Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063