森林优化特征选择算法的增强与扩展
作者:
作者单位:

作者简介:

刘兆赓(1993-),山东沂水人,男,硕士生,主要研究领域为机器学习;李占山(1966-),男,博士,教授,博士生导师,CCF专业会员,主要研究领域为约束优化与约束求解,机器学习,基于模型的诊断,智能规划与调度;王丽(1994-),女,硕士生,主要研究领域为机器学习;王涛(1969-),女,副教授,主要研究领域为约束优化与约束求解,机器学习;于海鸿(1975-),男,博士,讲师,主要研究领域为约束优化与约束求解,大数据与数据挖掘,智能规划与调度.

通讯作者:

于海鸿,E-mail:yuhh@jlu.edu.cn

中图分类号:

TP18

基金项目:

国家自然科学基金(61672261);吉林省自然科学基金(20180101043JC);吉林省发改委产业技术研究与开发专项资金(2019C053-9)


Enhancement and Extension of Feature Selection Using Forest Optimization Algorithm
Author:
Affiliation:

Fund Project:

National Natural Science Foundation of China (61672261); Natural Science Foundation of Jilin Province (2018010 1043JC); Industrial Technology Research and Development Special Project of Jilin Province Development and Reform Commission (2019C053-9)

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    特征选择作为一种重要的数据预处理方法,不但能解决维数灾难问题,还能提高算法的泛化能力.各种各样的方法已被应用于解决特征选择问题,其中,基于演化计算的特征选择算法近年来获得了更多的关注并取得了一些成功.近期研究结果表明,森林优化特征选择算法具有更好的分类性能及维度缩减能力.然而,初始化阶段的随机性、全局播种阶段的人为参数设定,影响了该算法的准确率和维度缩减能力;同时,算法本身存在着高维数据处理能力不足的本质缺陷.从信息增益率的角度给出了一种初始化策略,在全局播种阶段,借用模拟退火控温函数的思想自动生成参数,并结合维度缩减率给出了适应度函数;同时,针对形成的优质森林采取贪心算法,形成一种特征选择算法EFSFOA(enhanced feature selection using forest optimization algorithm).此外,在面对高维数据的处理时,采用集成特征选择的方案形成了一个适用于EFSFOA的集成特征选择框架,使其能够有效处理高维数据特征选择问题.通过设计对比实验,验证了EFSFOA与FSFOA相比在分类准确率和维度缩减率上均有明显的提高,高维数据处理能力更是提高到了100 000维.将EFSFOA与近年来提出的比较高效的基于演化计算的特征选择方法进行对比,EFSFOA仍具有很强的竞争力.

    Abstract:

    As an important data preprocessing method, feature selection can not only solve the dimensionality disaster problem, but also improve the generalization ability of algorithms. A variety of methods have been applied to solve feature selection problems, where evolutionary computation techniques have recently gained much attention and shown some success. Recent study has shown that feature selection using forest optimization algorithm has better classification performance and dimensional reduction ability. However, the randomness of initialization phase and the artificial parameter setting of global seeding phase affect the accuracy and the dimension reduction ability of the algorithm. At the same time, the algorithm itself has the essential defect of insufficient high-dimensional data processing capability. In this study, an initialization strategy is given from the perspective of information gain rate, parameter is automatically generated by using simulated annealing temperature control function during global seeding, a fitness function is given by combining dimension reduction rate, using greedy algorithm to select the best tree from the high-quality forest obtained, and a feature selection algorithm EFSFOA (enhanced feature selection using forest optimization algorithm) is proposed. In addition, in the face of high-dimensional data processing, ensemble feature selection scheme is used to form an ensemble feature selection framework suitable for EFSFOA, so that it can effectively deal with the problem of high-dimensional data feature selection. Through designing some contrast experiments, it is verified that EFSFOA has significantly improved classification accuracy and dimensionality reduction rate compared with FSFOA, and the high-dimensional data processing capability has been increased to 100 000 dimensions. Comparing EFSFOA with other efficient evolutionary computation for feature selection approaches which have been proposed in recent years, EFSFOA still has strong competitiveness.

    参考文献
    相似文献
    引证文献
引用本文

刘兆赓,李占山,王丽,王涛,于海鸿.森林优化特征选择算法的增强与扩展.软件学报,2020,31(5):1511-1524

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2018-07-12
  • 最后修改日期:2018-08-05
  • 录用日期:
  • 在线发布日期: 2020-05-18
  • 出版日期: 2020-05-06
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号