An Algorithm Based on Partition for Outlier Detection

微信服务号

微信订阅号

2025-4-6- 15

Home > Archive>Volume 17, Issue 5, 2006 >1009-1016

An Algorithm Based on Partition for Outlier Detection
DOI:
                        
                    
Author:
                        SUN Huan-LiangSUN Huan-Liang

Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
BAO Yu-BinBAO Yu-Bin

Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
YU GeYU Ge

Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
ZHAO Fa-XinZHAO Fa-Xin

Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
WANG Da-LingWANG Da-Ling

Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference [8]

Related [20]

Cited by [7]

Materials

Comments

Abstract:

Outliers are objects that do not comply with the general behavior of the data. The method of partition divides data space into a set of non-overlapping rectangular cells by partitioning every dimension into equal length. Statistical information of cells is used to find knowledge in datasets. There exists very large data skew in real-life datasets, so partition will produce many empty cells, which affects the efficiency of the algorithms. An efficient index structure called CD-Tree (cell dimension tree) is designed for indexing cells. Moreover, to guide partition and to optimize the structure of CD-Tree, the concept of SOD (skew of data) is proposed to measure the degree of data skew. Finally, the CD-Tree-based algorithm is designed for outlier detection based on CD-Tree and SOD. The experimental results show that the efficiency of CD-Tree-based algorithm and the maximum number of dimensions processed increase obviously comparing with the Cell-based algorithm on real-life datasets.

Key words:data mining; outlier detection; partition; CD-tree (cell dimension tree); cell-based algorithm

Get Citation

孙焕良,鲍玉斌,于戈,赵法信,王大玲.一种基于划分的孤立点检测算法.软件学报,2006,17(5):1009-1016

Copy

Article Metrics

Abstract:4368
PDF: 5556
HTML: 0
Cited by: 0

History

Received:June 26,2004
Revised:May 23,2005
Adopted:
Online:
Published:

You are the first2033357Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History