刘艺,曹建军,刁兴春,周星.特征选择稳定性研究综述.软件学报,2018,29(9):2559-2579 |
特征选择稳定性研究综述 |
Survey on Stability of Feature Selection |
投稿时间:2017-04-24 修订日期:2017-07-10 |
DOI:10.13328/j.cnki.jos.005394 |
中文关键词: 高维数据 特征选择 稳定性 稳定性指标 集成选择 演化算法 |
英文关键词:high dimensional data feature selection stability stability measures ensemble selection evolutionary algorithms |
基金项目:国家自然科学基金(61371196);中国博士后科学基金(201003797) |
|
摘要点击次数: 4991 |
全文下载次数: 5041 |
中文摘要: |
随着大数据的发展和机器学习的广泛应用,各行业的数据量呈现大规模的增长,高维性是这些数据的重要特点,采用特征选择对高维数据进行降维是一种预处理方法.特征选择稳定性是其中重要的研究内容,它是指特征选择方法对训练样本的微小扰动具有一定鲁棒性.提高特征选择稳定性有助于发现相关特征,增强特征可信度,进一步降低开销.在回顾现有特征选择稳定性提升方法的基础上对其进行分类,分析比较各类方法的特点和适用范围,总结特征选择稳定性中的相关评估工作,并通过实验剖析其中稳定性度量指标的性能,进而对比4种集成方法的效用.最后讨论当前工作的局限性,指出未来的研究方向. |
英文摘要: |
With the development of big data and the wide application of machine learning, data from all walks of life is growing massively. High dimensionality is one of its most important characteristics, and applying feature selection to reduce dimensions is one of the preprocessing methods of high dimensional data. Stability of feature selection is an important research direction, and it stands for the robustness of results with respect to small changes in the dataset composition. Improving the stability of feature selection can help to identify relevant features, increase experts' confidence to the results, and further reduce the complexity and costs of getting original data. This paper reviews current methods for improving the stability, and presents a classification of those methods with analysis and comparison on the characteristics and range of application of each category. Then it summarizes the evaluations of stability of feature selection, and analyzes the performance of stability measurement and validates the effectiveness of four ensemble approaches through experiments. Finally, it discusses the localization of current works and a perspective of the future work in this research area. |
HTML 下载PDF全文 查看/发表评论 下载PDF阅读器 |