Supported by the Foundation of the Innovation Research Institute of PKU-IBM (北京大学-IBM中国研究中心联合实验室资助项目); the National Grand Fundamental Research 973 Program of China under Grant No.G1999032705 (国家重点基础研究发展规划(973)
特征选择在模式识别和数据挖掘等领域都有十分广泛的应用.然而,当涉及空间数据时,由于传统特征选择方法没有很好地考虑数据的空间特性,所以会导致特征选择结果性能下降.从空间数据本身的特性出发,提出一种特征选择方法MEFS(maximum entropy feature selection).MEFS在基于最大熵原理的基础上,运用互信息和Z-测试技术,采用两步方法进行空间特征选择.第1步,空间谓词选择;第2步,选择与每个空间谓词对应的相关属性集.最后,分别对MEFS方法和RELIEF方法以及基于MEFS的分类方法与决策树算法ID3分别进行了实验比较.实验结果表明,MEFS方法不仅可以节约特征提取和分类时间,而且也极大地提高了分类质量.
Feature selection has an important application in the field of pattern recognition and data mining etc. However, in real world domains, if there are spatial data operated in the application, the performance of feature selection will be decreased because of without considering the characteristic of spatial data. In this paper, a feature selection method from the point of the characteristic of spatial data, named MEFS (maximum entropy feature selection), is proposed. Based on the theory of maximum entropy, MEFS uses mutual information and Z-test technologies, and takes two-step method to execute feature selection. The first step is predicate selection, and the second step is to choose relevant dataset corresponding to each predicate. At last, the experiments between feature selection algorithms MEFS and RELIEF, and between ID3 classification algorithm and classification algorithm based on MEFS are carried out. The experimental results show that the MEFS algorithm not only saves feature selection and classification time, but also improves the quality of classification.