Clustering algorithms are wildly used in the research of microarray data to extract groups of genes or samples that are tightly coexpressed. In most of them, some parameters should be predefined artificially, however, it is very difficult to determine them manually without prior domain knowledge. To handle this problem, an iterative clustering algorithm is proposed. Firstly, by sorting the original data by dominant set, similar genes would be aligned together. It’s hard to specify the cluster boundary. A criterion is presented to partition a cluster from the sorted data according to the property that the distances between the inside elements are smaller than that of outside elements. The idea is to remove the cluster form the current data set, repeat the process, and stop the algorithm when the stop criterions are satisfied. The new clustering algorithm is analyzed on several aspects and tested on the published yeast cell-cycle microarray data. The results of the application confirm that the method is very applicable, efficient and has good ability to resist noise.
[1]Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E, Lander ES, Gloub TR. Interpreting patterns of gene expression with self-organizing maps: Methods and application to hematopoietic differentiation. Proc. of the National Academy of Sciences, USA, 1999,96:2907-2912.
[2]Carr DB, Somogyi R, Michaels G. Templates for looking at gene expression clustering. Statistical Computing & Statistical Graphics Newsletter, 1997,8:20-29.
[3]Eisen MB, Spellman PT, Brown PO, Bottstein D. Cluster analysis and display of genome-wide expression patterns. Proc. of the National Academy of Sciences, USA, 1998,95:14863-14868.
[4]Herrero J, Valencia A, Dopazo J. A hierarchical unsupervised growing neural network for clustering gene expression patters.Bioinformatics, 2001,17:126-136.
[5]Tavazoie S, Hughes JD, Campbell MJ, Cho R J, Church GM. Systematic determination of genetic network architecture. Nature Genetics, 1999,22:281-285.
[6]Lukashin AV, Fuchs R. Analysis of temporal gene expression profiles: Clustering by simulated annealing and determining the optimal number of clusters. Bioinformatics, 2001,17(5):405-414.
[7]Ben-Dor A, Yakhini Z. Clustering gene expression patterns. Journal of Computational Biology, 1999,6:281-297.
[8]Heyer LJ, Kruglyak S, Yooseph S. Exploring expression data: identification and analysis of coexpressed genes. Genome Research,1999,9(11): 1106-1115.
[9]de Risi JL, Iyer VR, Brown PO. Exploring the metabolic and genetic control of gene expression on a genomic scale. Science, 1997,278:680-686.
[10]Lander ES. Array of hope. Nature Genetics, 1999,21:3-4.
[11]Schena M, Shalon D, Davis R, Brown P. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science, 1995,270:467-470.
[12]Sherlock G. Analysis of large-scale gene expression data. BriefBioinformatics, 2001,2(4):350-362.
[13]Pavan M, Pelillo M. A new graph-theoretic approach to clustering and segmentation. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. Computer Society, 2003.98-104. http:∥www.dsi.unive.it/~pelillo/papers/cvpr03.pdf