Enhancement and Extension of Feature Selection Using Forest Optimization Algorithm
Author:
Affiliation:

Clc Number:

TP18

Fund Project:

National Natural Science Foundation of China (61672261); Natural Science Foundation of Jilin Province (2018010 1043JC); Industrial Technology Research and Development Special Project of Jilin Province Development and Reform Commission (2019C053-9)

  • Article
  • | |
  • Metrics
  • |
  • Reference [32]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    As an important data preprocessing method, feature selection can not only solve the dimensionality disaster problem, but also improve the generalization ability of algorithms. A variety of methods have been applied to solve feature selection problems, where evolutionary computation techniques have recently gained much attention and shown some success. Recent study has shown that feature selection using forest optimization algorithm has better classification performance and dimensional reduction ability. However, the randomness of initialization phase and the artificial parameter setting of global seeding phase affect the accuracy and the dimension reduction ability of the algorithm. At the same time, the algorithm itself has the essential defect of insufficient high-dimensional data processing capability. In this study, an initialization strategy is given from the perspective of information gain rate, parameter is automatically generated by using simulated annealing temperature control function during global seeding, a fitness function is given by combining dimension reduction rate, using greedy algorithm to select the best tree from the high-quality forest obtained, and a feature selection algorithm EFSFOA (enhanced feature selection using forest optimization algorithm) is proposed. In addition, in the face of high-dimensional data processing, ensemble feature selection scheme is used to form an ensemble feature selection framework suitable for EFSFOA, so that it can effectively deal with the problem of high-dimensional data feature selection. Through designing some contrast experiments, it is verified that EFSFOA has significantly improved classification accuracy and dimensionality reduction rate compared with FSFOA, and the high-dimensional data processing capability has been increased to 100 000 dimensions. Comparing EFSFOA with other efficient evolutionary computation for feature selection approaches which have been proposed in recent years, EFSFOA still has strong competitiveness.

    Reference
    [1] Liu H, Yu L. Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. on Knowledge and Data Engineering, 2005,17(4):491-502.
    [2] Oh IS, Lee JS, Moon BR. Hybrid genetic algorithms for feature selection. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2004,26(11):1424-1437.
    [3] Maldonado S, Weber R. A wrapper method for feature selection using support vector machines. Information Sciences, 2009, 179(13):2208-2217.
    [4] Shah SC, Kusiak A. Data mining and genetic algorithm based gene/SNP selection. Artificial Intelligence in Medicine, 2004,31(3): 183-196.
    [5] Guyon I, Elisseeff A. An introduction to variable and feature selection. Journal of Machine Learning Research, 2003,3(6): 1157-1182.
    [6] Peng H, Long F, Ding C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min- redundancy. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2005,27(8):1226-1238.
    [7] Yu L, Liu H. Feature selection for high-dimensional data: A fast correlation-based filter solution. In: Proc. of the 20th Int’l Conf. on Machine Learning (ICML 2003). AAAI, 2003. 856-863.
    [8] Robnik-Šikonja M, Kononenko I. Theoretical and empirical analysis of ReliefF and RReliefF. Machine Learning, 2003,53(1-2): 23-69.
    [9] Gu Q, Han J. Towards feature selection in network. In: Proc. of the 20th ACM Int’l Conf. on Information and Knowledge Management. ACM, 2011. 1175-1184.
    [10] Zhao Z, Liu H. Spectral feature selection for supervised and unsupervised learning. In: Proc. of the 24th Int’l Conf. on Machine Learning. ACM, 2007. 1151-1157.
    [11] Masaeli M, Yan Y, Cui Y, et al. Convex principal feature selection. In: Proc. of the 2010 SIAM Int’l Conf. on Data Mining. SIAM, 2010. 619-628.
    [12] Farahat AK, Ghodsi A, Kamel MS. An efficient greedy method for unsupervised feature selection. In: Proc. of the 2011 IEEE 11th Int’l Conf. on Data Mining (ICDM). IEEE, 2011. 161-170.
    [13] Efron B, Hastie T, Johnstone I, et al. Least angle regression. The Annals of statistics, 2004,32(2):407-499.
    [14] Xue B, Zhang M, Browne WN, et al. A survey on evolutionary computation approaches to feature selection. IEEE Trans. on Evolutionary Computation, 2016,20(4):606-626.
    [15] Zhu W, Si G, Zhang Y, et al. Neighborhood effective information ratio for hybrid feature subset evaluation and selection. Neurocomputing, 2013,99:25-37.
    [16] Xue B, Zhang M, Browne WN. Particle swarm optimisation for feature selection in classification: Novel initialisation and updating mechanisms. Applied Soft Computing, 2014,18:261-276.
    [17] Tabakhi S, Moradi P, Akhlaghian F. An unsupervised feature selection algorithm based on ant colony optimization. Engineering Applications of Artificial Intelligence, 2014,32:112-123.
    [18] Zhang Y, Song X, Gong D. A return-cost-based binary firefly algorithm for feature selection. Information Sciences, 2017,418-419: 561-574.
    [19] Ghaemi M, Feizi-Derakhshi MR. Feature selection using forest optimization algorithm. Pattern Recognition, 2016,60:121-129.
    [20] Chu B, Li ZS, Zhang ML, et al. Research on improvements of feature selection using forest optimization algorithm. Journal of Software, 2018,29(9):2547-2558(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5395.htm [doi: 10.13328/j. cnki.jos.005395]
    [21] Jadhav S, He H, Jenkins K. Information gain directed genetic algorithm wrapper feature selection for credit rating. Applied Soft Computing, 2018,69:541-553.
    [22] Pereira RB, Plastino A, Zadrozny B, et al. Information gain feature selection for multi-label classification. Journal of Information and Data Management, 2015,6(1):48-58.
    [23] Yiğit F, Baykan ÖK. A new feature selection method for text categorization based on information gain and particle swarm optimization. In: Proc. of the 2014 IEEE 3rd Int’l Conf. on Cloud Computing and Intelligence Systems (CCIS). IEEE, 2014. 523-529.
    [24] Kirkpatrick S, Gelatt CD, Vecchi MP. Optimization by simulated annealing. Science, New Series, 1983,220(4598):671-680.
    [25] Dua D, Graff C. UCI machine learning repository. Irvine: School of Information and Computer Science, University of California, 2017. http://archive.ics.uci.edu/ml
    [26] Ghaemi M, Feizi-Derakhshi MR. Forest optimization algorithm. Expert Systems with Applications, 2014,41(15):6676-6687.
    [27] Cai J, Luo J, Wang S, et al. Feature selection in machine learning: A new perspective. Neurocomputing, 2018,300:70-79.
    [28] Moustakidis SP, Theocharis JB. SVM-FuzCoC: A novel SVM-based feature selection method using a fuzzy complementary criterion. Pattern Recognition, 2010,43(11):3712-3729.
    [29] Hu Q, Che X, Zhang L, et al. Feature evaluation and selection based on neighborhood soft margin. Neurocomputing, 2010, 73(10-12):2114-2124.
    [30] Huang J, Cai Y, Xu X. A hybrid genetic algorithm for feature selection wrapper based on mutual information. Pattern Recognition Letters, 2007,28(13):1825-1844.
    附中文参考文献:
    [20] 初蓓,李占山,张梦林,等.基于森林优化特征选择算法的改进研究.软件学报,2018,29(9):2547-2558. http://www.jos.org.cn/1000-9825/5395.htm [doi: 10.13328/j.cnki.jos.005395]
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

刘兆赓,李占山,王丽,王涛,于海鸿.森林优化特征选择算法的增强与扩展.软件学报,2020,31(5):1511-1524

Copy
Share
Article Metrics
  • Abstract:1820
  • PDF: 4046
  • HTML: 1547
  • Cited by: 0
History
  • Received:July 12,2018
  • Revised:August 05,2018
  • Online: May 18,2020
  • Published: May 06,2020
You are the first2037990Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063