大数据的密度统计合并算法

doi:10.13328/j.cnki.jos.004902

微信服务号

微信订阅号

首页 > 过刊浏览>2015年第26卷第11期 >2820-2835. DOI:10.13328/j.cnki.jos.004902

PDF HTML阅读 XML下载导出引用引用提醒

大数据的密度统计合并算法
DOI:
                        10.13328/j.cnki.jos.004902
                    
作者:
                        
                        
                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:国家自然科学基金(61103058, 61233011)

Density-Based Statistical Merging Algorithm for Large Data Sets

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

针对处理大数据时传统聚类算法失效或效果不理想的问题,提出了一种大数据的密度统计合并算法(density-based statistical merging algorithm for large data sets,简称DSML).该算法将数据点的每个特征看作一组独立随机变量,并根据独立有限差分不等式获得统计合并判定准则.首先,使用统计合并判定准则对Leaders算法做出改进,获得代表点集;随后,结合代表点的密度和邻域信息,再次使用统计合并判定准则完成对整个数据集的聚类.理论分析和实验结果表明,DSML算法具有近似线性的时间复杂度,能处理任意形状的数据集,且对噪声具有良好的鲁棒性,非常有利于处理大规模数据集.

Abstract:

To tackle the failure of traditional clustering algorithms in dealing with large-scale data, the paper proposes a density-based statistical merging algorithm for large data sets (DSML). The algorithm takes each feature of data points as a set of independent random variable, and gets statistical merger criteria from the independent bounded difference inequality. To begin with, DSML improves Leaders algorithm by using the statistical merger criteria, and makes the improved algorithm as the sampling algorithm to obtain representative points. Secondly, combined with the density and the neighborhood information of representative points, the algorithm uses statistical merger criteria again to complete the clustering of the whole data set. Theoretical analysis and experimental results show that, DSML algorithm has nearly linear time complexity, can handle arbitrary data sets, and is insensitive to noise data. This fully proves the validity of DSML algorithm for large data sets.

参考文献

相似文献

引证文献

引用本文

刘贝贝,马儒宁,丁军娣.大数据的密度统计合并算法.软件学报,2015,26(11):2820-2835

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2015-05-30
最后修改日期:2015-08-26
录用日期:
在线发布日期: 2015-11-04
出版日期:

微信服务号

微信订阅号

引用本文

分享

文章指标

历史

文章二维码