一种基于k近邻图的稀有类检测算法

doi:10.13328/j.cnki.jos.004872

微信服务号

微信订阅号

2025年4月24日 0:57 星期四

首页 > 过刊浏览>2016年第27卷第9期 >2320-2331. DOI:10.13328/j.cnki.jos.004872

PDF HTML阅读 XML下载导出引用引用提醒

一种基于k近邻图的稀有类检测算法
DOI:
                        10.13328/j.cnki.jos.004872
                    
CSTR:
                        
                    
作者:
                        王淞王淞
武汉大学 计算机学院, 湖北 武汉 430072
在期刊界中查找
在百度中查找
在本站中查找
黄浩黄浩
武汉大学 计算机学院, 湖北 武汉 430072
在期刊界中查找
在百度中查找
在本站中查找
余果余果
武汉大学 中南医院, 湖北 武汉 430072
在期刊界中查找
在百度中查找
在本站中查找
梁楠梁楠
武汉大学 计算机学院, 湖北 武汉 430072
在期刊界中查找
在百度中查找
在本站中查找
王黎维王黎维
武汉大学 国际软件学院, 湖北 武汉 430072
在期刊界中查找
在百度中查找
在本站中查找
孙月明孙月明
武汉大学 计算机学院, 湖北 武汉 430072
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:国家自然科学基金（61502347，61272275，61202033，61070013，U1135005）；中央高校基本科研业务费专项资金（2042015kf0038）；武汉大学人才计划/引进人才科研启动经费

Rare Category Detection Algorithm Based on k-Nearest Neighbor Graphs

Author:

WANG Song
WANG Song
Computer School, Wuhan University, Wuhan 430072, China
在期刊界中查找
在百度中查找
在本站中查找
HUANG Hao
HUANG Hao
Computer School, Wuhan University, Wuhan 430072, China
在期刊界中查找
在百度中查找
在本站中查找
YU Guo
YU Guo
Zhongnan Hospital, Wuhan University, Wuhan 430072, China
在期刊界中查找
在百度中查找
在本站中查找
LIANG Nan
LIANG Nan
Computer School, Wuhan University, Wuhan 430072, China
在期刊界中查找
在百度中查找
在本站中查找
WANG Li-Wei
WANG Li-Wei
Intl School of Software, Wuhan University, Wuhan 430072, China
在期刊界中查找
在百度中查找
在本站中查找
SUN Yue-Ming
SUN Yue-Ming
Computer School, Wuhan University, Wuhan 430072, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

National Natural Science Foundation of China (61502347, 61272275, 61202033, 61070013, U1135005); Fundamental Research Funds for the Central Universities (2042015kf0038); Research Funds for Introduced Talents of Wuhan University

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

稀有类检测的目标是为无类别标签的数据集中的每个类，特别是仅含少量数据样本的稀有类，寻找到至少一个数据样本以证明数据集中存在这些类.该技术在金融欺诈检测及网络入侵检测等现实问题中具有广泛的应用场景.但是，现有的稀有类检测算法往往存在以下问题：（1）时间复杂度比较高；或（2）对原始数据集需要一定的先验知识，如数据集中各类数据样本所占比例等.提出了一种基于k邻近图的无先验快速稀有类检测算法KRED，通过利用稀有类数据样本在小范围内紧密分布所造成的与周边数据分布的不一致性来定位稀有类.为此，KRED将给定数据集转化为k邻近图，并计算图中各顶点入度和边长的变化.最后，将以上变化最大的顶点对应的数据样本作为稀有类的候选样本.实验结果表明：KRED有效提高了发现数据集中各个类的效率，明显缩短了算法运行所需时间.

关键词:稀有类检测;k邻近图;数据分布;变化系数;入度

Abstract:

Rare category detection aims at finding at least one data example for each class in an unlabeled data set to prove the existence of these classes, especially the rare classes (a.k.a. rare categories) that have only a few data examples. It has various applications in the fields like financial fraud detection and network intrusion detection. Nevertheless, the existing approaches to this problem suffer either in terms of time complexity or the requirements for prior information about data sets (e.g., the proportion of data examples in each class). In this paper, a prior-free and efficient algorithm, called KRED is proposed for rare category detection. The algorithm explores the changes on local data distribution caused by the presence of the compact clusters of rare classes. To this end, it transforms a data set into a k-nearest neighbor graph, and investigates the variations in both edge lengths and in-degrees between the nodes. Finally, nodes with the maximal variations are selected as the candidate data examples of rare classes. Experimental results show that KRED effectively improves the efficiency of discovering new classes in data sets, and notably reduces the execution time.

Key words:rare category detection;k-nearest neighbor graph;data distribution;variation coefficient;in-degree

引用本文

王淞,黄浩,余果,梁楠,王黎维,孙月明.一种基于k近邻图的稀有类检测算法.软件学报,2016,27(9):2320-2331

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2014-12-01
最后修改日期:2015-03-10
录用日期:
在线发布日期: 2016-09-02
出版日期:

微信服务号

微信订阅号

引用本文

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码