基于可辨识矩阵的完全自适应2D特征选择算法
作者:
作者单位:

作者简介:

谢娟英(1971-),女,博士,教授,博士生导师,CCF高级会员,主要研究领域为机器学习,数据挖掘,生物医学数据分析;
吴肇中(1995-),男,硕士生,主要研究领域为机器学习,生物医学数据分析.

通讯作者:

谢娟英,E-mail:xiejuany@snnu.edu.cn

中图分类号:

基金项目:

国家自然科学基金(62076159,61673251,12031010);国家重点研发计划(2016YFC0901900);中央高校基本科研业务费专项资金(GK202105003);研究生培养创新基金(2016CSY009,2018TS078)


Totally Adaptive 2D Feature Selection Algorithm Based on Discernibility Matrix
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对基于信息增益与皮尔森相关系数的特征选择算法FSIP (feature selection based on information gain and Pearson correlation coefficient)存在的特征子集选取需要人工参与的问题,提出基于可辨识矩阵的完全自适应2D特征选择算法DFSIP (discernibility based FSIP).DFSIP算法完全自适应地发现特征子集,每次选择当前特征中最重要的一个特征,并以此特征约简可辨识矩阵,剔除冗余特征,最终自适应地获得最优特征子集.依据最优特征子集构建K-ELM分类器来评价最优特征子集的类别辨识能力.在基因数据集的实验测试以及与FSIP,mRMR,LLE Score,DRJMIM,AVC,AMID算法的实验比较和统计重要性检测表明:DFSIP算法能够自动选择出辨识能力更强的特征子集,基于此特征子集的分类器具有很好的分类性能.

    Abstract:

    To overcome the limitations of the FSIP (feature selection based on information gain and Pearson correlation coefficient) feature selection algorithm that need human to determine the borderline to detect the feature subsets, the totally adaptive 2D feature selection algorithm is proposed in this study based on discernibility matrix. It is referred to as DFSIP (discernibility based FSIP). DFSIP introduces discernibility matrix into the feature selection process of FSIP. It first initializes the candidate feature set comprising all features and constructs the initial discernibility matrix, then it detects the most significant feature from the current candidate feature set, so as to add it to feature subset and use it to reduce the discernibility matrix. After that the candidate feature set is updated using the union of the cells of the reduced discernibility matrix, and the most significant feature is detected from the current candidate feature set again, so as to put it into the feature subset and use it to reduce the discernibility matrix, and the candidate feature set is updated again. This process repeats till there is not any feature left in the candidate feature set. The power of DFSIP is tested on very famous gene expression datasets, and its performance is compared with that of the popular feature selection algorithms including FSIP, mRMR, LLE Score, DRJMIM, AVC, and AMID by comparing the performance of the K-ELM classifier built using the feature subset detected by these feature selection algorithms. In addition, the significant test is done to verify whether or not there is the significant difference between DFSIP and FSIP as well as other compared feature selection algorithms. The experimental results demonstrate that DFSIP is superior to the compared ones, especially it has the significant difference to LLE Score, DRJMIM, and AMID feature selection algorithms. Although there is not significant difference between DFSIP and FSIP, it defeats FSIP in performance. It can be concluded that DFSIP can totally adaptively detect the feature subset with sound classification capability.

    参考文献
    相似文献
    引证文献
引用本文

谢娟英,吴肇中.基于可辨识矩阵的完全自适应2D特征选择算法.软件学报,2022,33(4):1338-1353

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2021-03-10
  • 最后修改日期:2021-07-16
  • 录用日期:
  • 在线发布日期: 2021-10-26
  • 出版日期: 2022-04-06
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号