一种基于概念相似度的数据分类方法
作者:
基金项目:

Supported by the National Natural Science Foundation of China under Grant No.60473071 (国家自然科学基金); the China Postdoctoral Science Foundation under Grant No.20060400002 (中国博士后科学基金); the Major Science and Technology Project of Sichuan Province of China under

  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [15]
  • |
  • 相似文献 [20]
  • |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    依据数据属性间的相似信息,提出了一种分类方法.该方法将属性矢量化,属性作为m维空间的基本矢量,数据记录作为属性矢量的和.利用属性间先验的概念相似信息,给出了求取任意属性矢量对的相似距离算法,并将数据间相关度计算转换为属性矢量及其相互投影的公式,从而得到任意两条数据的相关度;利用相关度,提出了一种分类算法.用详实的实验证明了该算法的有效性.

    Abstract:

    In this paper, a method of classification is proposed based on the similar information of data properties. The new method assumes that data properties are basic vectors of m dimensions, and each of the data is viewed as a sum vector of all the property-vectors. It suggests a novel distance algorithm to get the distance of every pair of the property based on similar information of the basic property vectors. An algorithm of data classification is also presented based on correlation computing formula composed of property vectors and projections of each other. Efficiency of the new method is proved by extensive experiments.

    参考文献
    [1]Indyk P,Motwani R.Approximate nearest neighbors:Towards removing the curse of dimensionality.In:Jeffrey V,ed.Proc.of the 30th Annual ACM Symp.on Theory of Computing.New York:ACM Press,1998.604-613.
    [2]Kleinberg J.Two algorithms for nearest-neighbor search in high dimensions.In:Leighton FT,Borodin A,eds.Proc.of the 27th Annual ACM Symp.on Theory of Computing.New York:ACM Press,1997.599-608.
    [3]Kushilevitz E,Ostrovsky R,Rabani Y.Efficient search for approximate nearest neighbor in high dimensional spaces.SIAM Journal on Computing,2000,30(2):451-474.
    [4]Aggarwal C.Hierarchical subspace sampling:A unified framework for high dimensional data reduction,selectivity estimation,and nearest neighbor search.In:Michael J,ed.Proc.of the ACM SIGMOD Conf.New York:ACM Press,2002.452-463.
    [5]Berchtold S,Keim D,Kriegel HP.The X-tree:An index structure for high dimensional data.In:Vijayaraman TM,Buchmann AP,Mohan C,Sarda NL,eds.Proc.of the 22nd Int'l Conf.on Very Large Databases.San Francisco:ACM Press,1996,28-39.
    [6]Beyer K,Goldstein J,Ramakrishnan R,Shaft U.When is nearest neighbors meaningful? In:Beeri C,Buneman P,eds.Proc.of the 7th Int'l Conf.on Database Theory.Jerusalem:Springer-Verlag,1999.217-235.
    [7]Gionis A,Indyk P,Motwani R.Similarity search in high dimensions via hashing.In:Atkinson MP,Orlowska ME,Valduriez P,Zdonik SB,Brodie ML,eds.Proc.of the 25th Int'l Conf.on Very Large Databases.San Francisco:ACM Press,1999.518-529.
    [8]Goldstein J,Ramakrishnan R.Contrast plots and P-sphere trees:Space vs.time in nearest neighbour searches.In:Abbadi AE,Brodie ML,Chakravarthy S,Dayal U,Kamel N,Schlageter G,Whang KY,eds.Proc.of the 26th Int'l Conf.on Very Large Databases.San Francisco:ACM Press,2000.429-440.
    [9]White D,Jain R.Similarity indexing with the SS-tree.In:Su SYW,ed.Proc.of the 12th Int'l Conf.on Data Engineering.New Orleans:IEEE Computer Society,1996.516-523.
    [10]Dwork C,Kumar R,Naor M,Sivakumar D.Rank aggregation methods for the web.In:Shen VY,Saito N,Lyu MR,Zurko ME,eds.Proc.of the 10th Int'l World Wide Web Conf.New York:ACM Press,2001.613-622.
    [11]Pettie S,Ramachandran V.A shortest path algorithm for real-weighted undirected graphs.SIAM Journal on Computing,2005,34(6):1398-1431.
    [12]Han Y.Improved algorithm for all pairs shortest paths.Information Processing Letters,2004,91(5):245-250.
    [13]Pettie S,Ramachandran V,Sridhar S.Experimental evaluation of a new shortest path algorithm.In:Mount D,Stein C,eds.Proc.of the 4th ALENEX.London:Springer-Verlag,2002.126-142.
    [14]Peng J,Tang CJ,Zeng T,Qiao SJ,Yong XJ.A Chinese traditional medicine prescription effect reduction algorithm based on artificial neural network and property distance matrix.Journal of Sichuan University (Engineering Science Edition),2006,38(1):92-97 (in Chinese with English abstract).
    [14]彭京,唐常杰,曾涛,乔少杰,雍小嘉.基于神经网络和属性距离矩阵的中药方剂功效归约算法.四川大学学报(工程科学版),2006,38(1):92-97.
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

彭京,唐常杰,元昌安,李川,胡建军.一种基于概念相似度的数据分类方法.软件学报,2007,18(2):311-322

复制
分享
文章指标
  • 点击次数:5050
  • 下载次数: 6642
  • HTML阅读次数: 0
  • 引用次数: 0
历史
  • 收稿日期:2004-09-08
  • 最后修改日期:2006-04-26
文章二维码
您是第19762785位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号