• Article
  • | |
  • Metrics
  • |
  • Reference [8]
  • |
  • Related [20]
  • |
  • Cited by [7]
  • | |
  • Comments
    Abstract:

    Outliers are objects that do not comply with the general behavior of the data. The method of partition divides data space into a set of non-overlapping rectangular cells by partitioning every dimension into equal length. Statistical information of cells is used to find knowledge in datasets. There exists very large data skew in real-life datasets, so partition will produce many empty cells, which affects the efficiency of the algorithms. An efficient index structure called CD-Tree (cell dimension tree) is designed for indexing cells. Moreover, to guide partition and to optimize the structure of CD-Tree, the concept of SOD (skew of data) is proposed to measure the degree of data skew. Finally, the CD-Tree-based algorithm is designed for outlier detection based on CD-Tree and SOD. The experimental results show that the efficiency of CD-Tree-based algorithm and the maximum number of dimensions processed increase obviously comparing with the Cell-based algorithm on real-life datasets.

    Reference
    [1]Knorr E,Ng R.Algorithms for mining distance-based outliers in large data sets.In:Gupta A,Shmueli O,Widom J,eds.Proc.of the VLDB Conf.New York:Morgan Kaufmann Publishers,1998.392-403.
    [2]Knorr E,Ng R.Finding intensional knowledge of distance-based outliers.In:Atkinson MP,Orlowska ME,Valduriez P,Zdonik SB,Brodie ML,eds.Proc.of the VLDB Conf.Edinburgh:Morgan Kaufmann Publishers,1999.211-222.
    [3]Ramaswamy S,Rastogi R,Shim K.Efficient algorithms for mining outliers from large data sets.In:Chen WD,Naughton JF,Bernstein PA,eds.Proc.of the ACM SIGMOD Conf.Dallas:ACM Press,2000.427-438.
    [4]Breunig MM,Kriegel HP,Ng R,Sander J.LOF:Identifying density-based local outliers.In:Chen WD,Naughton JF,Bernstein PA,eds.Proc.of the ACM SIGMOD Conf.Dallas:ACM Press,2000.94-104.
    [5]Arning A,Agrawal R,Raghavan P.A linear method for deviation detection in large databases.In:Simoudis E,Han JW,Fayyad UM,eds.Proc.of the KDD Conf.Portland:AAAI Press,1996.164-169.
    [6]Beckmann N,Kriegel HP,Schneider R,Seeger B.The R*-tree:An efficient and robust access method for points and rectangles.In:Hector GM,Jagadish HV,eds.Proc.of the ACM SIGMOD Conf.Atlantic:ACM Press,1990.322-331.
    [7]Katayama N,Satoh S.The SR-tree:An index structure for high-dimensional nearest neighbor queries.In:Peckham J,ed.Proc.of the ACM SIGMOD Conf.Tucson:ACM Press,1997.369-380.
    [8]Berchtold S,Keim DA,Kriegel H.The X-tree:An index structure for high-dimensional data.In:Vijayaraman TM,Buchmann AP,Mohan C,Sarda NL,eds.Proc.of the 22nd VLDB Conf.Bombay:Morgan Kaufmann Publishers,1996.28-39.
    Comments
    Comments
    分享到微博
    Submit
Get Citation

孙焕良,鲍玉斌,于戈,赵法信,王大玲.一种基于划分的孤立点检测算法.软件学报,2006,17(5):1009-1016

Copy
Share
Article Metrics
  • Abstract:4368
  • PDF: 5556
  • HTML: 0
  • Cited by: 0
History
  • Received:June 26,2004
  • Revised:May 23,2005
You are the first2033357Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063