Density-Based Statistical Merging Algorithm for Large Data Sets

doi:10.13328/j.cnki.jos.004902

微信服务号

微信订阅号

2025-4-21- 20

Home > Archive>Volume 26, Issue 11, 2015 >2820-2835. DOI:10.13328/j.cnki.jos.004902

PDF HTML XML Export Cite reminder

Density-Based Statistical Merging Algorithm for Large Data Sets
DOI:
                        10.13328/j.cnki.jos.004902
                    
Author:
                        LIU Bei-BeiLIU Bei-Bei
College of Science, Nanjing University of Aeronautics and Astronautics, Nanjing 211100, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
MA Ru-NingMA Ru-Ning
College of Science, Nanjing University of Aeronautics and Astronautics, Nanjing 211100, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
DING Jun-DiDING Jun-Di
School of Computer Science and Technology, Nanjing University of Science and Technology, Nanjing 210094, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

To tackle the failure of traditional clustering algorithms in dealing with large-scale data, the paper proposes a density-based statistical merging algorithm for large data sets (DSML). The algorithm takes each feature of data points as a set of independent random variable, and gets statistical merger criteria from the independent bounded difference inequality. To begin with, DSML improves Leaders algorithm by using the statistical merger criteria, and makes the improved algorithm as the sampling algorithm to obtain representative points. Secondly, combined with the density and the neighborhood information of representative points, the algorithm uses statistical merger criteria again to complete the clustering of the whole data set. Theoretical analysis and experimental results show that, DSML algorithm has nearly linear time complexity, can handle arbitrary data sets, and is insensitive to noise data. This fully proves the validity of DSML algorithm for large data sets.

Key words:clustering;sampling;leader;density;large data

Get Citation

刘贝贝,马儒宁,丁军娣.大数据的密度统计合并算法.软件学报,2015,26(11):2820-2835

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:May 30,2015
Revised:August 26,2015
Adopted:
Online: November 04,2015
Published:

You are the first2036766Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History