A Microarray Cluster Algorithm Based on Dominant Set Segmentation

DOI：

 作者 单位 滕莉 复旦大学,计算机科学与工程系,上海,200433 付旭平 复旦大学,生命科学学院,遗传研究所,上海,200433 李宏宇 复旦大学,计算机科学与工程系,上海,200433 李瑶 复旦大学,生命科学学院,遗传研究所,上海,200433 陈文斌 复旦大学,数学系,上海,200433 李荣宇 上海博星基因芯片有限责任公司,上海,200092 沈一帆 复旦大学,计算机科学与工程系,上海,200433

聚类算法广泛应用于生物芯片数据分析中,用于寻找表达相似的基因或样本.大多数已有算法都需要人为地给出一些参数,然而在没有先验知识的情况下,人为地确定这些参数是十分困难的.为了解决这一难题,提出了一种迭代的聚类算法,首先用主集方法对原有基因进行重新排序,使高度相似的基因排列在特定区域.类的分割界线通常难于确定.提出一种标准,根据类内元素间的距离远小于类外元素间的距离的性质,从排序后的数据集中划分出一个类.将找到的类从当前数据集中排除以后,对剩下的数据重复以上处理,直到满足所提出的徨停止条件为止.从多方面分析了

Clustering algorithms are wildly used in the research of microarray data to extract groups of genes or samples that are tightly coexpressed. In most of them, some parameters should be predefined artificially, however, it is very difficult to determine them manually without prior domain knowledge. To handle this problem, an iterative clustering algorithm is proposed. Firstly, by sorting the original data by dominant set, similar genes would be aligned together. It’s hard to specify the cluster boundary. A criterion is presented to partition a cluster from the sorted data according to the property that the distances between the inside elements are smaller than that of outside elements. The idea is to remove the cluster form the current data set, repeat the process, and stop the algorithm when the stop criterions are satisfied. The new clustering algorithm is analyzed on several aspects and tested on the published yeast cell-cycle microarray data. The results of the application confirm that the method is very applicable, efficient and has good ability to resist noise.
HTML  下载PDF全文  查看/发表评论  下载PDF阅读器