Abstract:In gene expression data analysis, discriminator genes are importantly informative genes for further research. Recently, a great deal of research has focused on the challenging task of identifying these informative genes from microarray data. However, the sizes of sample classes in microarray data are often unbalanced. The unbalance of samples has not been explicitly and correctly considered by the existing gene selection methods, especially nonparametric methods. Considering the unbalance of samples and the stability of the approach for identifying informative genes, a novel and model-free gene selection method is proposed in this paper. With considering within-class difference and between-class variation, as well as the homogeneities of the within-class difference and between-class variations, scoring functions of genes are constructed to select discriminator genes. This method is not only applicable in two-category case but also applicable in multi-category case. The experimental results on two publicly available microarray datasets, leukemia data and small round blue cell tumor data, show that the proposed method is very efficient and robust to select discriminator genes.