A Visual Approach to Parameter Selection of Density-Based Noise Removal for Effective Data Clustering
Affiliation:

  • Article
  • | |
  • Metrics
  • |
  • Reference [22]
  • |
  • Related
  • | | |
  • Comments
    Abstract:

    Traditional visual data mining relies on visualization techniques to disclose implicit information and relationship among data through utilizing human capability of pattern recognition. As an important step in data clustering, noise removal is a challenging topic as domain-specific noise is not well defined and cannot be removed by generic process of data cleaning. This paper addresses two conjugated and reciprocal issues in the use of visualization in noise removal? choosing appropriate visualization techniques based on data removing methods, and designing processing algorithms that suit visualization. The goal is a synthesis of visualization techniques and data mining methods to enhance the overall performance while reducing the subjective factor in visual mining procedure. A visual data cleaning approach called CLEAN is proposed to assist spatial data clustering in four important aspects: removal of domain-specific noise, visualization of data quality, selection of algorithm parameters, and measurement of noise removing methods on parameter sensitiveness. Experiments show that the visualization models in CLEAN do assist effective discovery of natural spatial clusters in a noisy environment.

    Reference
    [1]Fayyad UM,Piatetsky-Shapiro G,Smyth P.From data mining to knowledge discovery:An overview.In:Fayyad UM,et al.,eds.Advances in Knowledge Discovery and Data Mining,AAAI/MIT Press.1996.1-36.
    [2]Fayyad UM,Uthurusamy R.Evolving data mining into solutions for insights.Communications of the ACM,2002,45(8):28-31.
    [3]Kopanakis I,Theodoulidis B.Visual data mining modeling techniques for the visualization of mining outcomes.Journal of Visual Languages and Computing,2003,(14):543-589.
    [4]Ward MO,Zheng J.Visualization of spatio-temporal data quality.In:Proc.of the GIS/LIS.1993.727-737.
    [5]Fayyad UM,Grinstein G.Information Visualization in Data Mining and Knowledge Discovery.Morgan Kaufmann Publishers,2001.182-190.
    [6]Inselberg A.Data mining and visualization of high dimensional data.In:Proc.of the Workshop on Visual Data Mining.2001.65-81.
    [7]Han J,Kamber M,Tung AKH.Spatial clustering methods in data mining:A survey.In:Miller H,Han J,eds.Geographic Data Mining and Knowledge Discovery.Taylor and Francis.2001.1-29.
    [8]Hinneburg A.Density-Based clustering in large databases using projections and visualizations[Ph.D.Thesis].Konstanz:University of Konstanz,2002.
    [9]Ankerst M,et al.OPTICS:Ordering points to identify the clustering structure.In:Proc.of the 1999 ACM SIGMOD Conf.on Management of Data.ACM Press,1999.49-60.
    [10]Ertoz L,Steinbach M,Kumar V.Finding clusters of different sizes,shapes,and densities in noisy,high dimensional data.In:Proc.of the 3rd SIAM Int'l Conf.on Data Mining.Society for Industrial & Applied,2003.47-58.
    [11]Karypis G,Han E,Kumar V.CHAMELEON,a hierarchical clustering algorithm using dynamic modeling.IEEE Computer,1999,32:68-75.
    [12]Hinneburg A,Keim DA,Wawryniuk M.HD-Eye:Visual mining of high dimensional data.IEEE Computer Graphics and Applications,1999,19(5):22-31.
    [13]Sprenger TC,Brunella R,Gross MH.H-BLOB:A hierarchical visual clustering method using implicit surfaces.In:Proc.of the IEEE Visualization.IEEE CS Press,2000.61-68.
    [14]Seidman SB.Network structure and minimum degree.Social Networks,1983,5:269-287.
    [15]Zhang T,Ramakrishnan R,Linvy M.BIRCH:An efficient data clustering method for very large databases.In:Proc.of the ACM SIGMOD Conf on Management of Data.ACM Press,1996.103-114.
    [16]Wong YF.Nonlinear scale-space filtering and multi-resolution system.IEEE Trans.on Image Proc.,1995,4(6):774-787.
    [17]Qian Y,Zhang G,Zhang K.FACADE:A fast and effective approach to the discovery of dense clusters in noisy spatial data (demo abstract).In:Proc.of the ACM SIGMOD Conf on Management of Data.ACM Press,2004.921-922.
    [18]Qian Y,et al.Visualization-Informed noise elimination and its application in proc.high-spatial-resolution remote sensing imagery.Computers and Geosciences,2008,34:35-52.
    [19]Batagelj V,Mrvar A,Zaversnik M.Partitioning approaches to clustering in graphs.In:Proc.of the Graph Drawing 1999.LNCS,2000.90-97.
    [20]Ester M,et al.A density-based algorithm for discovering clusters in large spatial databases with noise.In:Proc.of the 2nd Int'l Conf.on Knowledge Discovery and Data Mining (KDD-96).AAAI Press,1996.226-231.
    [21]Hinneburg A,Keim DA.An efficient approach to clustering in large multimedia databases with noise.In:Proc.of the 4th Int'l Conf.on Knowledge Discovery and Data Mining (KDD-98).AAAI Press,1998.58-65.
    [22]Qian Y,Zhang K.GraphZip:A fast and automatic compression method for spatial data clustering.In:Proc.of the 2004 ACM Symp.on Applied Computing (SAC 2004).ACM Press,2004.571-575.
    Related
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

钱 宇.数据聚类中基于浓度噪音消除的可视化参数选择方法.软件学报,2008,19(8):1965-1979

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:April 18,2008
  • Revised:January 16,2008
You are the first2042104Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063