Spectral Clustering Based Unsupervised Feature Selection Algorithms
Author:
Affiliation:

Clc Number:

TP181

Fund Project:

National Natural Science Foundation of China (61673251); Key Projects of Science and Technology Research in Shaanxi Province (2018ZDXMSF-079); National Key Research and Development Program of China (2016YFC0901900); Scientific and Technological Achievements Transformation and Cultivation Funds of Shaanxi Normal University (GK201806013); Fundamental Research Funds for the Central Universities (GK201701006); Innovation Funds of Graduate Programs at Shaanxi Normal University (2015CXS028, 2016CSY009, 2018TS078)

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Gene expression data usually comprise small number of samples with tens of thousands of genes. There are a large number of genes unrelated to diseases in this kind of data. The primary task is to detect those key essential genes when analyzing this kind of data. The common feature selection algorithms depend on labels of data, but it is very difficult to get labels for data. To overcome the challenges, especially for gene expression data, the unsupervised feature selection idea is proposed, named as FSSC (feature selection by spectral clustering). FSSC groups all of features into clusters by a spectral clustering algorithm, so that similar features are in same clusters. The feature discernibility and independence are defined, and the feature importance is defined as the product of its discernibility and independence. The representative feature is selected from each cluster to construct the feature subset. According to the spectral clustering algorithms used in FSSC, three kinds of unsupervised feature selection algorithms named as FSSC-SD (FSSC based on standard deviation), FSSC-MD (FSSC based on mean distance) and FSSC-ST (FSSC based on self-tuning) are developed. The SVM (support vector machines) and KNN (K-nearest neighbors) classifiers are adopted to test the performance of the selected feature subsets in experiments. Experimental results on 10 gene expression datasets show that FSSC-SD, FSSC-MD, and FSSC-ST algorithms can select powerful features to classify samples.

    Reference
    Related
    Cited by
Get Citation

谢娟英,丁丽娟,王明钊.基于谱聚类的无监督特征选择算法.软件学报,2020,31(4):1009-1024

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:May 31,2019
  • Revised:July 29,2019
  • Adopted:
  • Online: January 14,2020
  • Published: April 06,2020
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063