大数据的密度统计合并算法

doi:10.13328/j.cnki.jos.004902

微信服务号

微信订阅号

2025年7月13日 11:28 星期日

首页 > 过刊浏览>2015年第26卷第11期 >2820-2835. DOI:10.13328/j.cnki.jos.004902

PDF HTML阅读 XML下载导出引用引用提醒

大数据的密度统计合并算法
DOI:
                        10.13328/j.cnki.jos.004902
                    
CSTR:
                        
                    
作者:
                        刘贝贝刘贝贝
南京航空航天大学 理学院, 江苏 南京 211100
在期刊界中查找
在百度中查找
在本站中查找
马儒宁马儒宁
南京航空航天大学 理学院, 江苏 南京 211100
在期刊界中查找
在百度中查找
在本站中查找
丁军娣丁军娣
南京理工大学 计算机科学与技术学院, 江苏 南京 210094
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:国家自然科学基金(61103058, 61233011)

Density-Based Statistical Merging Algorithm for Large Data Sets

Author:

LIU Bei-Bei
LIU Bei-Bei
College of Science, Nanjing University of Aeronautics and Astronautics, Nanjing 211100, China
在期刊界中查找
在百度中查找
在本站中查找
MA Ru-Ning
MA Ru-Ning
College of Science, Nanjing University of Aeronautics and Astronautics, Nanjing 211100, China
在期刊界中查找
在百度中查找
在本站中查找
DING Jun-Di
DING Jun-Di
School of Computer Science and Technology, Nanjing University of Science and Technology, Nanjing 210094, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

针对处理大数据时传统聚类算法失效或效果不理想的问题,提出了一种大数据的密度统计合并算法(density-based statistical merging algorithm for large data sets,简称DSML).该算法将数据点的每个特征看作一组独立随机变量,并根据独立有限差分不等式获得统计合并判定准则.首先,使用统计合并判定准则对Leaders算法做出改进,获得代表点集;随后,结合代表点的密度和邻域信息,再次使用统计合并判定准则完成对整个数据集的聚类.理论分析和实验结果表明,DSML算法具有近似线性的时间复杂度,能处理任意形状的数据集,且对噪声具有良好的鲁棒性,非常有利于处理大规模数据集.

关键词:聚类;抽样;代表点;密度;大数据

Abstract:

To tackle the failure of traditional clustering algorithms in dealing with large-scale data, the paper proposes a density-based statistical merging algorithm for large data sets (DSML). The algorithm takes each feature of data points as a set of independent random variable, and gets statistical merger criteria from the independent bounded difference inequality. To begin with, DSML improves Leaders algorithm by using the statistical merger criteria, and makes the improved algorithm as the sampling algorithm to obtain representative points. Secondly, combined with the density and the neighborhood information of representative points, the algorithm uses statistical merger criteria again to complete the clustering of the whole data set. Theoretical analysis and experimental results show that, DSML algorithm has nearly linear time complexity, can handle arbitrary data sets, and is insensitive to noise data. This fully proves the validity of DSML algorithm for large data sets.

Key words:clustering;sampling;leader;density;large data

引用本文

刘贝贝,马儒宁,丁军娣.大数据的密度统计合并算法.软件学报,2015,26(11):2820-2835

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2015-05-30
最后修改日期:2015-08-26
录用日期:
在线发布日期: 2015-11-04
出版日期:

微信服务号

微信订阅号

引用本文

相关视频

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

相关视频

分享

微信扫一扫：分享

文章指标

历史

文章二维码