Statistical Correlation and K-Means Based Distinguishable Gene Subset Selection Algorithms

doi:10.13328/j.cnki.jos.004644

微信服务号

微信订阅号

Home > Archive>Volume 25, Issue 9, 2014 >2050-2075. DOI:10.13328/j.cnki.jos.004644

PDF HTML XML Export Cite reminder

Statistical Correlation and K-Means Based Distinguishable Gene Subset Selection Algorithms
DOI:
                        10.13328/j.cnki.jos.004644
                    
Author:
                        
                        
                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

To deal with the challenging problem of recognizing the small number of distinguishable genes which can tell the cancer patients from normal people in a dataset with a small number of samples and tens of thousands of genes, novel hybrid gene selection algorithms are proposed in this paper based on the statistical correlation and K-means algorithm. The Pearson correlation coefficient and Wilcoxon signed-rank test are respectively adopted to calculate the importance of each gene to the classification to filter the least important genes and preserve about 10 percent of the important genes as the pre-selected gene subset. Then the related genes in the pre-selected gene subset are clustered via K-means algorithm, and the weight of each gene is calculated from the related coefficient of the SVM classifier. The most important gene, with the biggest weight or with the highest votes when the roulette wheel strategy is used, is chosen as the representative gene of each cluster to construct the distinguishable gene subset. In order to verify the effectiveness of the proposed hybrid gene subset selection algorithms, the random selection strategy (named Random) is also adopted to select the representative genes from clusters. The proposed distinguishable gene subset selection algorithms are compared with Random and the very popular gene selection algorithm SVM-RFE by Guyon and the pre-studied gene selection algorithm SVM-SFS. The average experimental results of 200 runs of the aforementioned gene selection algorithms on some classic and very popular gene expression datasets with extensive experiments demonstrate that the proposed distinguishable gene subset selection algorithms can find the optimal gene subset, and the classifier based on the selected gene subset achieves very high classification accuracy.

Reference

Cited by

Get Citation

谢娟英,高红超.基于统计相关性与K-means的区分基因子集选择算法.软件学报,2014,25(9):2050-2075

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:April 08,2014
Revised:May 14,2014
Adopted:
Online: September 09,2014
Published:

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

Article Metrics

History