• Article
  • | |
  • Metrics
  • |
  • Reference [19]
  • |
  • Related [20]
  • |
  • Cited by [10]
  • | |
  • Comments
    Abstract:

    In gene expression data analysis, discriminator genes are importantly informative genes for further research. Recently, a great deal of research has focused on the challenging task of identifying these informative genes from microarray data. However, the sizes of sample classes in microarray data are often unbalanced. The unbalance of samples has not been explicitly and correctly considered by the existing gene selection methods, especially nonparametric methods. Considering the unbalance of samples and the stability of the approach for identifying informative genes, a novel and model-free gene selection method is proposed in this paper. With considering within-class difference and between-class variation, as well as the homogeneities of the within-class difference and between-class variations, scoring functions of genes are constructed to select discriminator genes. This method is not only applicable in two-category case but also applicable in multi-category case. The experimental results on two publicly available microarray datasets, leukemia data and small round blue cell tumor data, show that the proposed method is very efficient and robust to select discriminator genes.

    Reference
    [1]Antoniadis A,Lambert-Lacroix S,Leblanc F.Effective dimension reduction methods for tumor classification using gene expression data.Bioinformatics,2003,19(5):563-570.
    [2]Dudoit S,Fridlyand J,Speed TP.Comparison of discrimination methods for the classification of tumors using gene expression data.Journal of the American Statistical Association,2002,97(457):77-87.
    [3]Lu Y,Hah J.Cancer classification using gene expression data.Information System,2003,28(4):243-268.
    [4]Xiong M,Fang X,Zhao J.Biomarker identification by feature wrappers.Genome Research,2001,11(11):1878-1887.
    [5]Guyon I,Weston J,Barnhill S,Vapnik V.Gene selection for cancer classification using support vector machines.Machine Learning,2002,46(3):389-422.
    [6]Ben-Dor A,Bruhn L,Friedman N,Nachma I.Tissue classification with gene expression profiles.In:Shamir R,Miyano S,Istrail S,Pevzner P,Waterman M,eds.Proc.of the 4th Annual Int'l Conf.on Computational Molecular Biology (Recomb).Tokyo:ACM,2000.
    [7]Lee KE,Sha N,Dougherty ER,Vannucci M,Mallick BK.Gene selection:A Bayesian variable selection approach.Bioinformatics,2003,19(1):90-97.
    [8]Hunter L,Taylor RC,Leach SM,Simon R.GEST:A gene expression search tool based on a novel Bayesian similarity metric.Bioinformatics,2001,17(Suppl.1):S115-S122.
    [9]Golub TR,Slonim DK,Tamayo P,Huard C,Gaasenbeek M,Mesirov JP,Coller H,Loh ML,Downing JR,Caligiuri MA,Bloomfield CD,Lander ES.Molecular classification of cancer:Class discovery and class prediction gene expression monitoring.Science,1999,286(5439):531-537.
    [10]Varma S,Simon R.Iterative class discovery and feature selection using minimal spannig trees.BMC Bioinformatics,2004,5:126.
    [11]B(o) TH,Jonassen I.New feature subset selection procedures for classification of expression profiles.Genome Biology,2002,3 (4):research0017.
    [12]Park PJ,Pagano M,Bonetti M.A nonparametric scoring algorithm for identifying informative genes form microarray data.In:Proc.of the Pacific Symp.on Biocomputing.2001,6:52-63.http://psb.stanford.edu/psb-online/proceedings/psb01 /
    [13]Cho JH,Lee D,Park JH,Lee IB.New gene selection for classification of cancer subtype considering within-class variation.FEBS Letters,2003,551(1):3-7.
    [14]Khan J,Wei JS,Ringner M,Saal LH,Ladanyi M,Westermann F,Berthold F,Schwarb M,Antonescu CR,Peterson C,Meltzer CR.Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks.Nature Medicine,2001,7(6):673-679.
    [15]Ramaswamy S,Tamayo P,Rifkin R,Mukherjee S,Yeang CH,Angelo M,Ladd C,Reich M,Latulippe E,Mesirov JP,Poggio T,Gerald W,Loda M,Lander ES,Golub TR.Multiclass cancer diagnosis using tumor gene expression signatures.Proc.of the National Academy of Sciences of the United States of America,2001,98(26):15149-15154.
    [16]Cawley GC.MATLAB support vector machine toolbox.2004.http://theoval.sys.uea.ac.uk/~gcc/svm/toolbox/
    [17]Bicciato S,Luchini A,Dibello C.PCA disnoint models for multiclass caner analysis using gene expression data.Bioinformatics,2003,19(5):571-578.
    [18]Tibshirani R,Hastie t Narasimhan B,Gilbert C.Diagnosis of multiple cancer types by shrunken centroids of gene expression.Proc.of the National Academy of Sciences of the United States of America,2002,99(10):6567-6572.
    [19]Fu LM,Casey SFL.Multi-Class cancer subtype classification based on gene expression signatures with reliability analysis.FEBSLetters,2004,561(2):186-190.
    Comments
    Comments
    分享到微博
    Submit
Get Citation

李建中,杨昆,高宏,骆吉洲,郭政.考虑样本不平衡的模型无关的基因选择方法.软件学报,2006,17(7):1485-1493

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:April 19,2005
  • Revised:December 13,2005
You are the first2045295Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063