Enhancement and Extension of Feature Selection Using Forest Optimization Algorithm

doi:10.13328/j.cnki.jos.005654

微信服务号

微信订阅号

2025-4-24- 9

Home > Archive>Volume 31, Issue 5, 2020 >1511-1524. DOI:10.13328/j.cnki.jos.005654

PDF HTML XML Export Cite reminder

Enhancement and Extension of Feature Selection Using Forest Optimization Algorithm
DOI:
                        10.13328/j.cnki.jos.005654
                    
Author:
                        LIU Zhao-GengLIU Zhao-Geng
College of Software, Jilin University, Changchun 130012, China;Key Laboratory of Symbolic Computation and Knowledge Engineering (Jilin University), Ministry of Education, Changchun 130012, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
LI Zhan-ShanLI Zhan-Shan
College of Software, Jilin University, Changchun 130012, China;College of Computer Science and Technology, Jilin University, Changchun 130012, China;Key Laboratory of Symbolic Computation and Knowledge Engineering (Jilin University), Ministry of Education, Changchun 130012, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
WANG LiWANG Li
College of Computer Science and Engineering, Changchun University of Technology, Changchun 130012, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
WANG TaoWANG Tao
College of Computer Science and Engineering, Changchun University of Technology, Changchun 130012, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
YU Hai-HongYU Hai-Hong
College of Computer Science and Technology, Jilin University, Changchun 130012, China;Key Laboratory of Symbolic Computation and Knowledge Engineering (Jilin University), Ministry of Education, Changchun 130012, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:TP18
Fund Project:National Natural Science Foundation of China (61672261); Natural Science Foundation of Jilin Province (2018010 1043JC); Industrial Technology Research and Development Special Project of Jilin Province Development and Reform Commission (2019C053-9)

Article

Figures

Metrics

Reference [32]

Related [20]

Cited by

Materials

Comments

Abstract:

As an important data preprocessing method, feature selection can not only solve the dimensionality disaster problem, but also improve the generalization ability of algorithms. A variety of methods have been applied to solve feature selection problems, where evolutionary computation techniques have recently gained much attention and shown some success. Recent study has shown that feature selection using forest optimization algorithm has better classification performance and dimensional reduction ability. However, the randomness of initialization phase and the artificial parameter setting of global seeding phase affect the accuracy and the dimension reduction ability of the algorithm. At the same time, the algorithm itself has the essential defect of insufficient high-dimensional data processing capability. In this study, an initialization strategy is given from the perspective of information gain rate, parameter is automatically generated by using simulated annealing temperature control function during global seeding, a fitness function is given by combining dimension reduction rate, using greedy algorithm to select the best tree from the high-quality forest obtained, and a feature selection algorithm EFSFOA (enhanced feature selection using forest optimization algorithm) is proposed. In addition, in the face of high-dimensional data processing, ensemble feature selection scheme is used to form an ensemble feature selection framework suitable for EFSFOA, so that it can effectively deal with the problem of high-dimensional data feature selection. Through designing some contrast experiments, it is verified that EFSFOA has significantly improved classification accuracy and dimensionality reduction rate compared with FSFOA, and the high-dimensional data processing capability has been increased to 100 000 dimensions. Comparing EFSFOA with other efficient evolutionary computation for feature selection approaches which have been proposed in recent years, EFSFOA still has strong competitiveness.

Key words:enhanced feature selection using forest optimization algorithm (EFSFOA);high-dimensional;feature selection;evolutionary computation

Reference

[1] Liu H, Yu L. Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. on Knowledge and Data Engineering, 2005,17(4):491-502.

[2] Oh IS, Lee JS, Moon BR. Hybrid genetic algorithms for feature selection. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2004,26(11):1424-1437.

[3] Maldonado S, Weber R. A wrapper method for feature selection using support vector machines. Information Sciences, 2009, 179(13):2208-2217.

[4] Shah SC, Kusiak A. Data mining and genetic algorithm based gene/SNP selection. Artificial Intelligence in Medicine, 2004,31(3): 183-196.

[5] Guyon I, Elisseeff A. An introduction to variable and feature selection. Journal of Machine Learning Research, 2003,3(6): 1157-1182.

[6] Peng H, Long F, Ding C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min- redundancy. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2005,27(8):1226-1238.

[7] Yu L, Liu H. Feature selection for high-dimensional data: A fast correlation-based filter solution. In: Proc. of the 20th Int’l Conf. on Machine Learning (ICML 2003). AAAI, 2003. 856-863.

[8] Robnik-Šikonja M, Kononenko I. Theoretical and empirical analysis of ReliefF and RReliefF. Machine Learning, 2003,53(1-2): 23-69.

[9] Gu Q, Han J. Towards feature selection in network. In: Proc. of the 20th ACM Int’l Conf. on Information and Knowledge Management. ACM, 2011. 1175-1184.

[10] Zhao Z, Liu H. Spectral feature selection for supervised and unsupervised learning. In: Proc. of the 24th Int’l Conf. on Machine Learning. ACM, 2007. 1151-1157.

[11] Masaeli M, Yan Y, Cui Y, et al. Convex principal feature selection. In: Proc. of the 2010 SIAM Int’l Conf. on Data Mining. SIAM, 2010. 619-628.

[12] Farahat AK, Ghodsi A, Kamel MS. An efficient greedy method for unsupervised feature selection. In: Proc. of the 2011 IEEE 11th Int’l Conf. on Data Mining (ICDM). IEEE, 2011. 161-170.

[13] Efron B, Hastie T, Johnstone I, et al. Least angle regression. The Annals of statistics, 2004,32(2):407-499.

[14] Xue B, Zhang M, Browne WN, et al. A survey on evolutionary computation approaches to feature selection. IEEE Trans. on Evolutionary Computation, 2016,20(4):606-626.

[15] Zhu W, Si G, Zhang Y, et al. Neighborhood effective information ratio for hybrid feature subset evaluation and selection. Neurocomputing, 2013,99:25-37.

[16] Xue B, Zhang M, Browne WN. Particle swarm optimisation for feature selection in classification: Novel initialisation and updating mechanisms. Applied Soft Computing, 2014,18:261-276.

[17] Tabakhi S, Moradi P, Akhlaghian F. An unsupervised feature selection algorithm based on ant colony optimization. Engineering Applications of Artificial Intelligence, 2014,32:112-123.

[18] Zhang Y, Song X, Gong D. A return-cost-based binary firefly algorithm for feature selection. Information Sciences, 2017,418-419: 561-574.

[19] Ghaemi M, Feizi-Derakhshi MR. Feature selection using forest optimization algorithm. Pattern Recognition, 2016,60:121-129.

[20] Chu B, Li ZS, Zhang ML, et al. Research on improvements of feature selection using forest optimization algorithm. Journal of Software, 2018,29(9):2547-2558(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5395.htm [doi: 10.13328/j. cnki.jos.005395]

[21] Jadhav S, He H, Jenkins K. Information gain directed genetic algorithm wrapper feature selection for credit rating. Applied Soft Computing, 2018,69:541-553.

[22] Pereira RB, Plastino A, Zadrozny B, et al. Information gain feature selection for multi-label classification. Journal of Information and Data Management, 2015,6(1):48-58.

[23] Yiğit F, Baykan ÖK. A new feature selection method for text categorization based on information gain and particle swarm optimization. In: Proc. of the 2014 IEEE 3rd Int’l Conf. on Cloud Computing and Intelligence Systems (CCIS). IEEE, 2014. 523-529.

[24] Kirkpatrick S, Gelatt CD, Vecchi MP. Optimization by simulated annealing. Science, New Series, 1983,220(4598):671-680.

[25] Dua D, Graff C. UCI machine learning repository. Irvine: School of Information and Computer Science, University of California, 2017. http://archive.ics.uci.edu/ml

[26] Ghaemi M, Feizi-Derakhshi MR. Forest optimization algorithm. Expert Systems with Applications, 2014,41(15):6676-6687.

[27] Cai J, Luo J, Wang S, et al. Feature selection in machine learning: A new perspective. Neurocomputing, 2018,300:70-79.

[28] Moustakidis SP, Theocharis JB. SVM-FuzCoC: A novel SVM-based feature selection method using a fuzzy complementary criterion. Pattern Recognition, 2010,43(11):3712-3729.

[29] Hu Q, Che X, Zhang L, et al. Feature evaluation and selection based on neighborhood soft margin. Neurocomputing, 2010, 73(10-12):2114-2124.

[30] Huang J, Cai Y, Xu X. A hybrid genetic algorithm for feature selection wrapper based on mutual information. Pattern Recognition Letters, 2007,28(13):1825-1844.

附中文参考文献:

[20] 初蓓,李占山,张梦林,等.基于森林优化特征选择算法的改进研究.软件学报,2018,29(9):2547-2558. http://www.jos.org.cn/1000-9825/5395.htm [doi: 10.13328/j.cnki.jos.005395]

Get Citation

刘兆赓,李占山,王丽,王涛,于海鸿.森林优化特征选择算法的增强与扩展.软件学报,2020,31(5):1511-1524

Copy

Article Metrics

Abstract:1820
PDF: 4046
HTML: 1547
Cited by: 0

History

Received:July 12,2018
Revised:August 05,2018
Adopted:
Online: May 18,2020
Published: May 06,2020

You are the first2037990Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History