基于森林优化特征选择算法的改进研究
作者:
作者简介:

初蓓(1990-),女,吉林桦甸人,硕士生,主要研究领域为机器学习;李占山(1966-),男,博士,教授,博士生导师,CCF专业会员,主要研究领域为约束优化与约束求解,机器学习,基于模型的诊断,智能规划与调度;张梦林(1991-),男,硕士生,主要研究领域为机器学习;于海鸿(1975-),男,博士,讲师,主要研究领域为约束优化与约束求解,大数据与数据挖掘,智能规划与调度.

通讯作者:

于海鸿,E-mail:yuhh@jlu.edu.cn

基金项目:

国家自然科学基金(61170314,61272208);吉林省自然科学基金(20140101200JC)


Research on Improvements of Feature Selection Using Forest Optimization Algorithm
Author:
Fund Project:

National Natural Science Foundation of China (61170314, 61272208); Jilin Province Natural Science Foundation (20140101200JC)

  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [34]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    在分类中,特征选择一直是一个重要而又困难的问题.最近的研究表明,森林优化特征选择算法(FSFOA)具有更好的分类性能及较好的维度缩减能力.然而,初始化阶段的随机性、更新机制上的局限性及局部播种阶段新树的劣质性严重限制了该算法的分类性能和维度缩减能力.该文采用一种新的初始化策略和更新机制,并在局部播种阶段加入贪婪策略,形成特征选择算法IFSFOA,在最大化分类性能的同时,最小化特征个数.实验阶段,IFSFOA使用SVM,J48和KNN分类器指导学习过程,通过机器学习数据库UCI上的小维、中维、高维数据集进行测试.实验结果表明:与FSFOA相比,IFSFOA在分类性能和维度缩减上均有明显提高.将IFSFOA算法与近几年提出的比较高效的特征选择方法进行对比,不论是在准确率,还是在维度缩减上,IFSFOA仍具有很强的竞争力.

    Abstract:

    In classification, feature selection has been an important, but difficult problem. Recent research results disclosed that feature selection using forest optimization algorithm (FSFOA) has a better classification performance and good dimensionality reduction ability. However, the randomness of initialization phase, the limitations of updating mechanism and the inferior quality of the new tree in the local seeding stage severely limit the classification performance and dimensionality reduction ability of the algorithm. In this paper, a new initialization strategy and updating mechanism are used and a greedy strategy is added in the local seeding stage to form a new feature selection algorithm (IFSFOA) in order to maximize the classification performance and simultaneously minimize the number of features. In experiment, IFSFOA uses SVM, J48 and KNN classifiers to guide the learning process while utilizing the machine learning database UCI for testing. The results show that compared with FSFOA, IFSFOA has a significant improvement in classification performance and dimensionality reduction. Comparing IFSFOA algorithm with more efficient feature selection methods proposed in recent years, IFSFOA is still very competitive in both accuracy and dimensionality reduction.

    参考文献
    [1] Zhou ZH. Machine Learning. Beijng:TsingHua University Press, 2016. 247-261(in Chinese).
    [2] Guyon I, Elisseeff A. An introduction to variable and feature selection. Journal of Machine Learning Research, 2003,3(6):1157-1182.
    [3] Ghaemi M, Feizi-Derakhshi MR. Feature selection using forest optimization algorithm. Pattern Recognition, 2016,60:121-129.
    [4] Xue B, Zhang M, Browne WN. Novel initialisation and updating mechanisms in PSO for feature selection in classification. In:Proc. of the European Conf. on Applications of Evolutionary Computation. Springer-Verlag, 2013. 428-438.
    [5] Nie DG. Research on improvements and discretization of forest optimization algorithm[MS. Thesis]. Lanzhou:Lanzhou University, 2016(in Chinese with English abstract).
    [6] Tabakhi S, Moradi P, Akhlaghian F. An unsupervised feature selection algorithm based on ant colony optimization. Engineering Applications of Artificial Intelligence, 2014,32(6):112-123.
    [7] Zhu W, Si G, Zhang Y, et al. Neighborhood effective information ratio for hybrid feature subset evaluation and selection. Neurocomputing, 2013,99:25-37.
    [8] Xue B, Zhang M, Browne WN. Particle swarm optimisation for feature selection in classification. Applied Soft Computing, 2014, 18(C):261-276.
    [9] Hu Q, Che X, Zhang L, et al. Feature evaluation and selection based on neighborhood soft margin. Neurocomputing, 2010, 73(10-12):2114-2124.
    [10] Moustakidis SP, Theocharis JB. SVM-FuzCoC:A novel SVM-based feature selection method using a fuzzy complementary criterion. Pattern Recognition, 2010,43(11):3712-3729.
    [11] Huang J, Cai Y, Xu X. A hybrid genetic algorithm for feature selection wrapper based on mutual information. Pattern Recognition Letters, 2007,28(13):1825-1844.
    [12] Hall MA. Correlation-Based Feature Selection for Machine Learning. 1999. 19.
    [13] Hall MA. Correlation-Based feature selection for discrete and numeric class machine learning. In:Proc. of the 17th Int'l Conf. on Machine Learning. Morgan Kaufmann Publishers Inc., 2000. 359-366.
    [14] Kira K, Rendell LA. The feature selection problem:Traditional methods and a new algorithm. In:Proc. of the 10th National Conf. on Artificial Intelligence. AAAI Press, 1992. 129-134.
    [15] Almuallim H, Dietterich TG. Efficient algorithms for identifying relevant features. In:Proc. of the Canadian Conf. on Artificial Intelligence. Oregon State University, 1992. 38-45.
    [16] Kohavi R, John GH. Wrappers for Feature Subset Selection. Elsevier Science Publishers Ltd., 1997.
    [17] Liu H, Setiono R. Feature selection and classification-A probabilistic wrapper approach. In:Proc. of the Int'l Conf. on Industrial & Engineering Applications of AI & ES. 1996. 419-424.
    [18] Efron B, Hastie T, Johnstone I, et al. Least angle regression. Annals of Statistics, 2004,32(2):407-451.
    [19] Almuallim H, Dietterich TG. Learning Boolean concepts in the presence of many irrelevant features. Artificial Intelligence, 1994, 69(1-2):279-305.
    [20] Pudil P, Novovičová J, Kittler J. Floating search methods in feature selection. Pattern Recognition Letters, 1994,15(11):1119-1125.
    [21] Fujarewicz K, Wiench M. Selecting differentially expressed genes for colon tumor classification. International Journal Of Applied Mathematics and Computer Science, 2003,13(3):327-335.
    [22] Guyon I, Weston J, Barnhill S. Gene selection for cancer classification using support vector machines. Machine Learning, 2002, 46(1-3):389-422.
    [23] Mao Y, Zhou XB, Xia Z, et al. A survey for study of feature selection algorithms. Pattern Recognition and Artificial Intelligence, 2007,20(2):211-218(in Chinese with English abstract).
    [24] Kabir MM, Shahjahan M, Murase K. A new hybrid ant colony optimization algorithm for feature selection. Expert Systems with Applications, 2012,39(3):3747-3763.
    [25] Hamdani TM, Won JM, Alimi AM, et al. Hierarchical genetic algorithm with new evaluation function and bi-coded representation for the selection of features considering their confidence rate. Applied Soft Computing, 2011,11(2):2501-2509.
    [26] Tan KC, Teoh EJ, Yu Q, et al. A hybrid evolutionary algorithm for attribute selection in data mining. Expert Systems with Applications, 2009,36(4):8616-8630.
    [27] Ghaemi M, Feizi-Derakhshi MR. Forest optimization algorithm. Expert Systems with Applications, 2014,41(15):6676-6687.
    [28] Whitney AW. A direct method of nonparametric measurement selection. IEEE Trans. on Computers, 1971,20(9):1100-1103.[doi:10.1109/T-C.1971.223410]
    [29] Marill T, Green DM. On the effectiveness of receptors in recognition systems. IEEE Trans. on Information Theory, 1963,9(1):11-17.[doi:10.1109/TIT.1963.1057810]
    [30] Tahir MA, Bouridane A, Kurugollu F. Simultaneous feature selection and feature weighting using hybrid tabu search/K-nearest neighbor classifier. Pattern Recognition Letters, 2007,28(4):438-446.
    附中文参考文献:
    [1] 周志华.机器学习.北京:清华大学出版社,2016.247-261.
    [5] 聂大干.森林优化算法的改进及离散化研究[硕士学位论文].兰州:兰州大学,2016.
    [23] 毛勇,周晓波,夏铮,等.特征选择算法研究综述.模式识别与人工智能,2007,20(2):211-218.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

初蓓,李占山,张梦林,于海鸿.基于森林优化特征选择算法的改进研究.软件学报,2018,29(9):2547-2558

复制
分享
文章指标
  • 点击次数:6125
  • 下载次数: 7507
  • HTML阅读次数: 3485
  • 引用次数: 0
历史
  • 收稿日期:2017-04-24
  • 最后修改日期:2017-07-10
  • 录用日期:2017-09-26
  • 在线发布日期: 2017-11-13
文章二维码
您是第19728021位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号