基于MapReduce与相关子空间的局部离群数据挖掘算法

doi:10.13328/j.cnki.jos.004659

微信小程序

微信服务号

微信订阅号

首页 > 过刊浏览>2015年第26卷第5期 >1079-1095. DOI:10.13328/j.cnki.jos.004659

PDF HTML阅读 XML下载导出引用引用提醒

基于MapReduce与相关子空间的局部离群数据挖掘算法
DOI:
                        10.13328/j.cnki.jos.004659
                    
CSTR:
                        
                    
作者:
                        
                        
                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:国家自然科学基金(61272263)

Related-Subspace-Based Local Outlier Detection Algorithm Using MapReduce

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

针对高维海量数据,在MapReduce编程模型下,提出了一种基于相关子空间的局部离群数据挖掘算法.该算法首先利用属性维上的局部稀疏程度,重新定义了相关子空间,从而能够有效地刻画各种局部数据集上的分布特征;其次,利用局部数据集的概率密度,给出了相关子空间中的局部离群因子计算公式,有效地体现了相关子空间中数据对象不服从局部数据集分布特征的程度,并选取离群程度最大的N个数据对象定义为局部离群数据;在此基础上,采用LSH分布式策略,提出了一种MapReduce编程模型下的局部离群数据挖掘算法;最后,采用人工数据集和恒星光谱数据集,实验验证了该算法的有效性、可扩展性和可伸缩性.

Abstract:

In this paper, a related-subspace-based local outlier detection algorithm is proposed in MapReduce programming model for high-dimensional and massive data set. Firstly, the relevant subspace, which can effectively describe the local distribution of the various data sets, is redefined by using local sparseness of attribute dimensions. Secondly, a local outlier factor calculation formula in the relevant subspace is defined with probability density of local data sets. The formula can not only effectively reflect the outlierness of data object that does not obey the distribution of the local data set in relevant subspace, but also select N data objects with the greatest-outlierness as local outliers. Furthermore, a related-subspace-based local outlier detection algorithm is constructed by using LSH distributed strategy in MapReduce programming model. Finally, experimental results validate the effectiveness, scalability and extensibility of the presented algorithms by using artificial data and stellar spectral data as experimental data sets.

参考文献

相似文献

引证文献

引用本文

张继福,李永红,秦啸,荀亚玲.基于MapReduce与相关子空间的局部离群数据挖掘算法.软件学报,2015,26(5):1079-1095

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2013-12-19
最后修改日期:2014-05-21
录用日期:
在线发布日期: 2014-08-22
出版日期:

微信小程序

微信服务号

微信订阅号

引用本文

相关视频

分享

文章指标

历史

文章二维码