自然进化策略的特征选择算法研究
作者:
作者单位:

作者简介:

张鑫(1994-),男,硕士,主要研究领域为进化计算,强化学习.
李占山(1966-),男,博士,教授,博士生导师,CCF专业会员,主要研究领域为机器学习,约束推理.

通讯作者:

李占山,E-mail:zslizsli@163.com

中图分类号:

基金项目:

国家自然科学基金(61672261);吉林省自然科学基金(20180101043JC);吉林省发展和改革委员会产业技术研究与开发项目(2019C053-9)


Research on Feature Selection Algorithm Based on Natural Evolution Strategy
Author:
Affiliation:

Fund Project:

National Natural Science Foundation of China (61672261); Natural Science Foundation of Jilin Province (20180101043JC); Industrial Technology R&D Project of Jilin Province Development and Reform Commission (2019C053-9)

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    特征选择是一种NP-难问题,旨在剔除数据集中不相关及冗余的特征来减少模型训练的时间,提高模型的精确度.因此,特征选择在机器学习、数据挖掘和模式识别等领域中是一种重要的数据预处理手段.提出一种新的基于自然进化策略的特征选择算法——MCC-NES.首先,算法采用了基于对角协方差矩阵建模并通过梯度信息自适应调整参数的自然进化策略;其次,为了使算法有效地处理特征选择问题,在初始化阶段引入了一种特征编码方式;之后,结合分类准确率和维度缩减给出了算法的适应度函数;此外,面对高维数据引入了合作协同进化的思想,将原问题分解为相对较小的子问题并分别对每个子问题独立求解,然后,通过所有子问题相互联系来优化原问题的解决方案;进一步引入分布式种群进化的概念,实现多个种群竞争进化来增加算法的探索能力,并设计了种群重启策略以防止种群陷入局部最优解.最后将提出的算法与几种传统的特征选择算法在一些UCI公共数据集上进行对比实验,实验结果显示:所提出的算法可以有效地完成特征选择问题,并且与经典特征选择算法相比有一定的竞争力,尤其是在处理高维数据时有着出色的表现.

    Abstract:

    Feature selection is an NP-hard problem that aims to improve the accuracy of the model by eliminating irrelevant or redundant features to reduce model training time. Therefore, feature selection is an important data preprocessing technique in the fields of machine learning, data mining, and pattern recognition. This study proposes a new feature selection algorithm MCC-NES based on natural evolutionary strategy. Firstly, the algorithm adopts natural evolutionary strategy based on diagonal covariance matrix modeling, which adaptively adjusts parameters through gradient information. Secondly, in order to enable the algorithm to effectively deal with feature selection problems, a feature coding mechanism is introduced in the initialization phase, and combined with classification accuracy and dimensional reduction, given the new fitness function. In addition, the idea of sub-population cooperative co-evolution is introduced to solve high-dimensional data. The original problem is decomposed into relatively small sub-problems to reduce the combined effect of the original problem scale and each sub-question is solved independently, and then all sub-problems are correlated to optimize the solution to the original problem. Further, applying multiple competing evolutionary populations to enhance the exploration ability of the algorithm and designing a population restart strategy to prevent the population from falling into the local optimal solution. Finally, the proposed algorithm is compared with several traditional feature selection algorithms on some UCI public datasets. The experimental results show that the proposed algorithm can effectively complete the feature selection problem and has excellent performance compared with the classical feature selection algorithm, especially when dealing with high-dimensional data.

    参考文献
    相似文献
    引证文献
引用本文

张鑫,李占山.自然进化策略的特征选择算法研究.软件学报,2020,31(12):3733-3752

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2018-12-07
  • 最后修改日期:2019-06-17
  • 录用日期:
  • 在线发布日期: 2020-12-03
  • 出版日期: 2020-12-06
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号