一种基于概念相似度的数据分类方法

微信服务号

微信订阅号

2025年4月1日 5:17 星期二

首页 > 过刊浏览>2007年第18卷第2期 >311-322

一种基于概念相似度的数据分类方法
DOI:
                        
                    
CSTR:
                        
                    
作者:
                        彭京彭京
四川大学,计算机学院,四川,成都,610065;成都市公安局,科技处,四川,成都,610017
在期刊界中查找
在百度中查找
在本站中查找
唐常杰唐常杰
四川大学,计算机学院,四川,成都,610065
在期刊界中查找
在百度中查找
在本站中查找
元昌安元昌安
四川大学,计算机学院,四川,成都,610065
在期刊界中查找
在百度中查找
在本站中查找
李川李川
四川大学,计算机学院,四川,成都,610065
在期刊界中查找
在百度中查找
在本站中查找
胡建军胡建军
四川大学,计算机学院,四川,成都,610065
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:Supported by the National Natural Science Foundation of China under Grant No.60473071 (国家自然科学基金); the China Postdoctoral Science Foundation under Grant No.20060400002 (中国博士后科学基金); the Major Science and Technology Project of Sichuan Province of China under

A Data Classification Method Based on Concept Similarity

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [15]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

依据数据属性间的相似信息,提出了一种分类方法.该方法将属性矢量化,属性作为m维空间的基本矢量,数据记录作为属性矢量的和.利用属性间先验的概念相似信息,给出了求取任意属性矢量对的相似距离算法,并将数据间相关度计算转换为属性矢量及其相互投影的公式,从而得到任意两条数据的相关度;利用相关度,提出了一种分类算法.用详实的实验证明了该算法的有效性.

关键词:数据挖掘;概念相似度;相似距离;属性矢量;分类

Abstract:

In this paper, a method of classification is proposed based on the similar information of data properties. The new method assumes that data properties are basic vectors of m dimensions, and each of the data is viewed as a sum vector of all the property-vectors. It suggests a novel distance algorithm to get the distance of every pair of the property based on similar information of the basic property vectors. An algorithm of data classification is also presented based on correlation computing formula composed of property vectors and projections of each other. Efficiency of the new method is proved by extensive experiments.

Key words:data mining;concept similarity;similar distance;property vector;classification

参考文献

[1]Indyk P,Motwani R.Approximate nearest neighbors:Towards removing the curse of dimensionality.In:Jeffrey V,ed.Proc.of the 30th Annual ACM Symp.on Theory of Computing.New York:ACM Press,1998.604-613.

[2]Kleinberg J.Two algorithms for nearest-neighbor search in high dimensions.In:Leighton FT,Borodin A,eds.Proc.of the 27th Annual ACM Symp.on Theory of Computing.New York:ACM Press,1997.599-608.

[3]Kushilevitz E,Ostrovsky R,Rabani Y.Efficient search for approximate nearest neighbor in high dimensional spaces.SIAM Journal on Computing,2000,30(2):451-474.

[4]Aggarwal C.Hierarchical subspace sampling:A unified framework for high dimensional data reduction,selectivity estimation,and nearest neighbor search.In:Michael J,ed.Proc.of the ACM SIGMOD Conf.New York:ACM Press,2002.452-463.

[5]Berchtold S,Keim D,Kriegel HP.The X-tree:An index structure for high dimensional data.In:Vijayaraman TM,Buchmann AP,Mohan C,Sarda NL,eds.Proc.of the 22nd Int'l Conf.on Very Large Databases.San Francisco:ACM Press,1996,28-39.

[6]Beyer K,Goldstein J,Ramakrishnan R,Shaft U.When is nearest neighbors meaningful? In:Beeri C,Buneman P,eds.Proc.of the 7th Int'l Conf.on Database Theory.Jerusalem:Springer-Verlag,1999.217-235.

[7]Gionis A,Indyk P,Motwani R.Similarity search in high dimensions via hashing.In:Atkinson MP,Orlowska ME,Valduriez P,Zdonik SB,Brodie ML,eds.Proc.of the 25th Int'l Conf.on Very Large Databases.San Francisco:ACM Press,1999.518-529.

[8]Goldstein J,Ramakrishnan R.Contrast plots and P-sphere trees:Space vs.time in nearest neighbour searches.In:Abbadi AE,Brodie ML,Chakravarthy S,Dayal U,Kamel N,Schlageter G,Whang KY,eds.Proc.of the 26th Int'l Conf.on Very Large Databases.San Francisco:ACM Press,2000.429-440.

[9]White D,Jain R.Similarity indexing with the SS-tree.In:Su SYW,ed.Proc.of the 12th Int'l Conf.on Data Engineering.New Orleans:IEEE Computer Society,1996.516-523.

[10]Dwork C,Kumar R,Naor M,Sivakumar D.Rank aggregation methods for the web.In:Shen VY,Saito N,Lyu MR,Zurko ME,eds.Proc.of the 10th Int'l World Wide Web Conf.New York:ACM Press,2001.613-622.

[11]Pettie S,Ramachandran V.A shortest path algorithm for real-weighted undirected graphs.SIAM Journal on Computing,2005,34(6):1398-1431.

[12]Han Y.Improved algorithm for all pairs shortest paths.Information Processing Letters,2004,91(5):245-250.

[13]Pettie S,Ramachandran V,Sridhar S.Experimental evaluation of a new shortest path algorithm.In:Mount D,Stein C,eds.Proc.of the 4th ALENEX.London:Springer-Verlag,2002.126-142.

[14]Peng J,Tang CJ,Zeng T,Qiao SJ,Yong XJ.A Chinese traditional medicine prescription effect reduction algorithm based on artificial neural network and property distance matrix.Journal of Sichuan University (Engineering Science Edition),2006,38(1):92-97 (in Chinese with English abstract).

[14]彭京,唐常杰,曾涛,乔少杰,雍小嘉.基于神经网络和属性距离矩阵的中药方剂功效归约算法.四川大学学报(工程科学版),2006,38(1):92-97.

引用本文

彭京,唐常杰,元昌安,李川,胡建军.一种基于概念相似度的数据分类方法.软件学报,2007,18(2):311-322

复制

文章指标

点击次数:5050
下载次数: 6642
HTML阅读次数: 0
引用次数: 0

历史

收稿日期:2004-09-08
最后修改日期:2006-04-26
录用日期:
在线发布日期:
出版日期:

微信服务号

微信订阅号

引用本文

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码