两元指纹向量聚类问题的复杂性与改进启发式算法

微信服务号

微信订阅号

2025年6月2日 22:54 星期一

首页 > 过刊浏览>2008年第19卷第3期 >500-510

两元指纹向量聚类问题的复杂性与改进启发式算法
DOI:
                        
                    
CSTR:
                        
                    
作者:
                        刘培强刘培强
山东工商学院 信息与电子工程学院,山东 烟台 264005; 山东大学 计算机科学技术学院,山东 济南 250061
在期刊界中查找
在百度中查找
在本站中查找
朱大铭朱大铭
山东大学 计算机科学技术学院,山东 济南 250061
在期刊界中查找
在百度中查找
在本站中查找
谢青松谢青松
山东工商学院 信息与电子工程学院,山东 烟台 264005
在期刊界中查找
在百度中查找
在本站中查找
范 辉范 辉
山东工商学院 信息与电子工程学院,山东 烟台 264005
在期刊界中查找
在百度中查找
在本站中查找
马绍汉马绍汉
山东大学 计算机科学技术学院,山东 济南 250061
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:Supported by the National Natural Science Foundation of China under Grunt Nos.60573024,60673153(国家自然科学基金);the Natural Science Foundation of Shandong Province of China under Grant No.Z2004G03(山东省自然科学基金)

Complexity and Improved Heuristic Algorithms for Binary Fingerprints Clustering

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [23]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

证明丢失值位数不超过2的指纹向量聚类问题为NP-Hard,并给出Figueroa等人指纹向量聚类启发式算法的改进算法.主要改进了算法的实现方法.以链表存储相容顶点集合,并以逐位扫描指纹向量的方法产生相容点集链表,可将产生相容点集的时间复杂性由O(m·n·2^p)减小为O(m·(n-p+1)·2^p),可使划分一个唯一极大团或最大团的时间复杂性由O(m·p·2^p)减小为O(m·2^p).实际测试显示,改进算法的空间复杂性平均减少为原算法的49%以下,平均可用原算法20%的时间求解与原算法相同的实例.当丢失值位数超过6时,改进算法几乎总可用不超过原算法11%的时间计算与原算法相同的实例.

关键词:算法;复杂性;指纹向量聚类;基因表达谱;团划分

Abstract:

This paper proves the binary fingerprints clustering problem for 2 missing values per fingerprint is NP-Hard, and improves the Figueroa's heuristic algorithm. The new algorithm improves the implementation method for the original algorithm. Firstly, the linked list is used to store the sets of compatible vertices. The linked list can be produced by scanning the fingerprint vectors bit by bit. Thus the time complexity for producing the sets of compatible vertices is reduced from O(m·n·2^p) to O(m·(n-p+1)·2^p), and the the running time of finding a unique maximal clique or a maximal clique is improved from O(m·p·2^p) to O(m·2^p). The real testing displays that the improved algorithm takes 49% or lower space complexity of the original algorithm on the average for the computation of the same instance. It can use 20% time of the original algorithm for solving the same instance. Particularly, the new algorithm can almost always use not more than 11% time of the original algorithm to solve the instance with more than 6 missing values per fingerprint.

Key words:algorithm; complexity; fingerprint clustering; gene expression data; clique partition

参考文献

[1]Drmanac R,Drmanac S.CDNA screening by array hybridization.Methods in Enzymology,1999,303:165-178.

[2]Drmanac S,Stavropoulos NA,Labat I,Vonan J,Hauser B,Soares MB,Drmanac R.Gene representing edna clusters defined by hybridization of 57,419 clones from infant brain libraries with short oligonucleotide probes.Genomics,1996,37:29-40.

[3]Eisen MB,Spellman PT,Brown PO,Botstein D.Cluster analysis and display of genome-wide expression patterns.PNAS,1998,95:14863-14868.http://www.pnas.org/cgi/content/abstract/95/25/14863?ijkey=lqMr2M.gVTJZY

[4]Sneath PHA,Sokal RR,Numerical Taxonomy.San Francisco:W.K.Freeman and Company,1973.

[5]Herwig R,Poustka AJ,Muller C,Bull C,Lehrach H,O'Brien J.Large scale clustering of cdna-fingerprinting data.Genome Research,1999,9(11):1093-1105.

[6]Milosavljevic A,Strezosca Z,Zeremski M,Grujic D,Pannesku T,Crkvenjakov R.Clone clustering by hydbridization.Genomics,1995,27:83-89.

[7]Meier-Ewert S,Lange J,Gerts H,Herwig R,Schimitt A,Freund J,Elge T,Mott R,Herrmann B,Lehrach H.Comparative gene expression profiling by olignucleotide fingerprinting.Nucleic Acids Research,1998,26(9):2216-2223.

[8]Ding CHQ.Analysis ofgene expression profiles:Class discovery and leaf ordering.In:Pevaner P,eds.Proc.of the RECOMB 2002.New York:ACM,2002.127-136.http://portal.acm.org/citation.cfm?id=565196.565212

[9]Hartuv E,Schimitt A,Lange J,Meier-Ewert S,Lehrach H,Shamir R.An algorithm for clustering cdna fingerprints.Genomics,2000,66(3):249-256.

[10]Sharan R,Shamir R.Click:A clustering algorithm with applications to gene expression analysis.In:Bourne P,ed.Proc.of the ISMB 2000.AAAI Press,2000.307-316.http://portal.acm.org/citation.cfrn?id=645635.660836

[11]Xing EP,Karp RM.Cliff:Clustering of high dimensional microarray data via iterative feature filtering using normalized cuts.Bioinformatics,2001,17(Suppl.):306-315.

[12]Barash Y,Friedman N.Context-Specific Baysian clustering for gene expression data.In:Sankoff D,ed.Proc.of the RECOMB 2001.New York:ACM.2001.12-21.

[13]Ben-Dor A,Shamir R,Yakhini Z.Clustering gene expression patterns.Journal Computational Biology,1999,6:281-297.

[14]McLachlan GJ,Bean RW,Peel D.A mixture model-based approach to the clustering of microarray expression data.Bioinformatics,2002,18(3):413-422.

[15]Pan W,Lin J,Le CT.Model based cluster analysis ofmicroarray gene-expression data.Genome Biology,2002,3(2):1-8.

[16]Tamayo P,Slonim J,Mesirov D,Zhu J,Kitareewan S,Dmitrovsky E,Lander E,Golub T.Interpreting patterns of gene expression with self-organizing maps:Methods and applications to hematopoietic differention.PNAS,1999,96:2907-2912.

[17]Toronen P,Kolehmainen M,Wong G,Custren E.Analysis of gene expression data using self-organizing maps.FEBS Letters,1999,451:142-146.

[18]Shamir R Sharan R.Algorithmic approaches to clustering gene expression data.Current Topics in Computational Molecular Biology.2002.269-300.http://www.math.tau.ac.il/～rshamir/papers/book_rs.ps.gz

[19]Figueroa A,Bomeman J,Jiang T.Clustering binary fingerprint vectors with values for DNA array data analysis.In:Blauvelt Pat,eds.Proc.of the Computational Systems Bioinformatics.Washington:IEEE Computer Society.2003.38-47.http://portal.acm.ore/citation.cfm?id=937976.938130

[20]Valinsky L,Vedova GD,Bang T,Borneman J.Oligonucleotide fingerprinting of ribosomal rna genes for anaysis of fungal community composition.Applied and Enviromental Microbiology,2002,68(12):5999-6004.

[21]Karp RM,Reducibility among combinatorial problems.In:Complexity of Computer Computation,Miller RE,Thatcher JW,eds.New York:Plenum Press,85-103.

[22]Garey MR,Johnson DS.Computers and Intractability:A Guide to the Theory of NP-Completeness.San Francisco:Freeman WH and Company,1979.

[23]Liu PQ,Fan H,Zhu DM.Cluster analysis for DNA array data with missing values.Computer Science,2004,31(Suppl.):136-140(in Chinese with English abstract).

引用本文

刘培强,朱大铭,谢青松,范辉,马绍汉.两元指纹向量聚类问题的复杂性与改进启发式算法.软件学报,2008,19(3):500-510

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2006-05-18
最后修改日期:2007-01-23
录用日期:
在线发布日期:
出版日期:

微信服务号

微信订阅号

引用本文

相关视频

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

相关视频

分享

微信扫一扫：分享

文章指标

历史

文章二维码