一种基于密度的分布式聚类方法

doi:10.13328/j.cnki.jos.005343

微信服务号

微信订阅号

2025年4月17日 0:35 星期四

首页 > 过刊浏览>2017年第28卷第11期 >2836-2850. DOI:10.13328/j.cnki.jos.005343

PDF HTML阅读 XML下载导出引用引用提醒

一种基于密度的分布式聚类方法
DOI:
                        10.13328/j.cnki.jos.005343
                    
CSTR:
                        
                    
作者:
                        王岩王岩
吉林大学 计算机科学与技术学院, 吉林 长春 130012
在期刊界中查找
在百度中查找
在本站中查找
彭涛彭涛
吉林大学 计算机科学与技术学院, 吉林 长春 130012;符号计算与知识工程教育部重点实验室(吉林大学), 吉林 长春 130012
在期刊界中查找
在百度中查找
在本站中查找
韩佳育韩佳育
吉林大学 计算机科学与技术学院, 吉林 长春 130012
在期刊界中查找
在百度中查找
在本站中查找
刘露刘露
吉林大学 计算机科学与技术学院, 吉林 长春 130012
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:国家自然科学基金（60903098）；吉林省发改委产业技术研究与开发专项（2015Y055）；吉林省科技厅重点科技攻关项目（20150204040GX）；吉林大学研究生创新基金（2016183）

Density-Based Distributed Clustering Method

Author:

WANG Yan
WANG Yan
College of Computer Science and Technology, Jilin University, Changchun 130012, China
在期刊界中查找
在百度中查找
在本站中查找
PENG Tao
PENG Tao
College of Computer Science and Technology, Jilin University, Changchun 130012, China;Key Laboratory of Symbol Computation and Knowledge Engineering(Jilin University), Ministry of Education, Changchun 130012, China
在期刊界中查找
在百度中查找
在本站中查找
HAN Jia-Yu
HAN Jia-Yu
College of Computer Science and Technology, Jilin University, Changchun 130012, China
在期刊界中查找
在百度中查找
在本站中查找
LIU Lu
LIU Lu
College of Computer Science and Technology, Jilin University, Changchun 130012, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

National Natural Science Foundation of China (60903098); Industry Technology Research and Development Projects of Jilin Province Development and Reform Commission (2015Y055); Key Scientific Research Project of Jilin Province Department of Science (20150204040GX); Graduate Innovation Fund of Jilin University (2016183)

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

聚类是数据挖掘领域中的一种重要的数据分析方法.它根据数据间的相似度，将无标注数据划分为若干聚簇.CSDP是一种基于密度的聚类算法，当数据量较大或数据维数较高时，聚类的效率相对较低.为了提高聚类算法的效率，提出了一种基于密度的分布式聚类方法MRCSDP，利用MapReduce框架对实验数据进行聚类.该方法定义了独立计算单元和独立计算块的概念.首先，将数据拆分为若干数据块，构建独立计算单元和独立计算块，在集群中分配独立计算块的任务；然后进行分布式计算，得到数据块的局部密度，将局部密度合并得到全局密度，根据全局密度计算中心值，由全局密度和中心值得到每个数据块中候选聚簇中心；最后，从候选聚簇中心选举出最终的聚簇中心.MRCSDP在充分降低时间复杂度的基础上得到较好的聚类效果.实验结果表明，分布式环境下的聚类方法MRCSDP相对于CSDP更能快速、有效地处理大规模数据，并使各节点负载均衡.

关键词:聚类;分布式计算;MapReduce;独立计算单元;独立计算块

Abstract:

Clustering is an important method for data analysis in the field of data mining. The function of clustering is to divide unlabeled data divided into several groups according to the data similarity. CSDP is a density-based clustering method. When data size is large or data dimensionality is high, the efficiency of clustering is relatively low. In order to improve the efficiency of clustering algorithm, this paper proposes a density-based distributed clustering method, called MRCSDP, which uses MapReduce to cluster text data. This method introduces the definition of independent calculation unit and independent calculation block. First, data are split into several data blocks which are used to construct independent calculation unit and independent calculation block. The task for each independent calculation block is assigned. Then the distributed calculation is conducted to obtain the local density of the data blocks. The local densities are combined to obtain the global density. The center value is calculated according to the global density. Based on the global density and the center value, the candidate cluster centers of each data block can be obtained. Finally, the global cluster centers are obtained by calculating the density of all candidate cluster centers. MRCSDP can achieve better clustering performance by reducing time complexity. Experimental results show that compared to CSDP, MRCSDP can process large scale data more effectively with load-balancing on each computing nodes.

Key words:clustering;distributed computing;MapReduce;independent calculation unit;independent calculation block

引用本文

王岩,彭涛,韩佳育,刘露.一种基于密度的分布式聚类方法.软件学报,2017,28(11):2836-2850

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2017-04-14
最后修改日期:2016-06-16
录用日期:
在线发布日期: 2017-11-03
出版日期:

微信服务号

微信订阅号

引用本文

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码