Complexity and Improved Heuristic Algorithms for Binary Fingerprints Clustering

微信服务号

微信订阅号

2025-4-25- 11

Home > Archive>Volume 19, Issue 3, 2008 >500-510

Complexity and Improved Heuristic Algorithms for Binary Fingerprints Clustering
DOI:
                        
                    
Author:
                        LIU Pei-QiangLIU Pei-Qiang

Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
ZHU Da-MingZHU Da-Ming

Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
XIE Qing-SongXIE Qing-Song

Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
FAN HuiFAN Hui

Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
MA Shao-HanMA Shao-Han

Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference [23]

Cited by [1]

Materials

Comments

Abstract:

This paper proves the binary fingerprints clustering problem for 2 missing values per fingerprint is NP-Hard, and improves the Figueroa's heuristic algorithm. The new algorithm improves the implementation method for the original algorithm. Firstly, the linked list is used to store the sets of compatible vertices. The linked list can be produced by scanning the fingerprint vectors bit by bit. Thus the time complexity for producing the sets of compatible vertices is reduced from O(m·n·2^p) to O(m·(n-p+1)·2^p), and the the running time of finding a unique maximal clique or a maximal clique is improved from O(m·p·2^p) to O(m·2^p). The real testing displays that the improved algorithm takes 49% or lower space complexity of the original algorithm on the average for the computation of the same instance. It can use 20% time of the original algorithm for solving the same instance. Particularly, the new algorithm can almost always use not more than 11% time of the original algorithm to solve the instance with more than 6 missing values per fingerprint.

Key words:algorithm; complexity; fingerprint clustering; gene expression data; clique partition

Reference

[1]Drmanac R,Drmanac S.CDNA screening by array hybridization.Methods in Enzymology,1999,303:165-178.

[2]Drmanac S,Stavropoulos NA,Labat I,Vonan J,Hauser B,Soares MB,Drmanac R.Gene representing edna clusters defined by hybridization of 57,419 clones from infant brain libraries with short oligonucleotide probes.Genomics,1996,37:29-40.

[3]Eisen MB,Spellman PT,Brown PO,Botstein D.Cluster analysis and display of genome-wide expression patterns.PNAS,1998,95:14863-14868.http://www.pnas.org/cgi/content/abstract/95/25/14863?ijkey=lqMr2M.gVTJZY

[4]Sneath PHA,Sokal RR,Numerical Taxonomy.San Francisco:W.K.Freeman and Company,1973.

[5]Herwig R,Poustka AJ,Muller C,Bull C,Lehrach H,O'Brien J.Large scale clustering of cdna-fingerprinting data.Genome Research,1999,9(11):1093-1105.

[6]Milosavljevic A,Strezosca Z,Zeremski M,Grujic D,Pannesku T,Crkvenjakov R.Clone clustering by hydbridization.Genomics,1995,27:83-89.

[7]Meier-Ewert S,Lange J,Gerts H,Herwig R,Schimitt A,Freund J,Elge T,Mott R,Herrmann B,Lehrach H.Comparative gene expression profiling by olignucleotide fingerprinting.Nucleic Acids Research,1998,26(9):2216-2223.

[8]Ding CHQ.Analysis ofgene expression profiles:Class discovery and leaf ordering.In:Pevaner P,eds.Proc.of the RECOMB 2002.New York:ACM,2002.127-136.http://portal.acm.org/citation.cfm?id=565196.565212

[9]Hartuv E,Schimitt A,Lange J,Meier-Ewert S,Lehrach H,Shamir R.An algorithm for clustering cdna fingerprints.Genomics,2000,66(3):249-256.

[10]Sharan R,Shamir R.Click:A clustering algorithm with applications to gene expression analysis.In:Bourne P,ed.Proc.of the ISMB 2000.AAAI Press,2000.307-316.http://portal.acm.org/citation.cfrn?id=645635.660836

[11]Xing EP,Karp RM.Cliff:Clustering of high dimensional microarray data via iterative feature filtering using normalized cuts.Bioinformatics,2001,17(Suppl.):306-315.

[12]Barash Y,Friedman N.Context-Specific Baysian clustering for gene expression data.In:Sankoff D,ed.Proc.of the RECOMB 2001.New York:ACM.2001.12-21.

[13]Ben-Dor A,Shamir R,Yakhini Z.Clustering gene expression patterns.Journal Computational Biology,1999,6:281-297.

[14]McLachlan GJ,Bean RW,Peel D.A mixture model-based approach to the clustering of microarray expression data.Bioinformatics,2002,18(3):413-422.

[15]Pan W,Lin J,Le CT.Model based cluster analysis ofmicroarray gene-expression data.Genome Biology,2002,3(2):1-8.

[16]Tamayo P,Slonim J,Mesirov D,Zhu J,Kitareewan S,Dmitrovsky E,Lander E,Golub T.Interpreting patterns of gene expression with self-organizing maps:Methods and applications to hematopoietic differention.PNAS,1999,96:2907-2912.

[17]Toronen P,Kolehmainen M,Wong G,Custren E.Analysis of gene expression data using self-organizing maps.FEBS Letters,1999,451:142-146.

[18]Shamir R Sharan R.Algorithmic approaches to clustering gene expression data.Current Topics in Computational Molecular Biology.2002.269-300.http://www.math.tau.ac.il/～rshamir/papers/book_rs.ps.gz

[19]Figueroa A,Bomeman J,Jiang T.Clustering binary fingerprint vectors with values for DNA array data analysis.In:Blauvelt Pat,eds.Proc.of the Computational Systems Bioinformatics.Washington:IEEE Computer Society.2003.38-47.http://portal.acm.ore/citation.cfm?id=937976.938130

[20]Valinsky L,Vedova GD,Bang T,Borneman J.Oligonucleotide fingerprinting of ribosomal rna genes for anaysis of fungal community composition.Applied and Enviromental Microbiology,2002,68(12):5999-6004.

[21]Karp RM,Reducibility among combinatorial problems.In:Complexity of Computer Computation,Miller RE,Thatcher JW,eds.New York:Plenum Press,85-103.

[22]Garey MR,Johnson DS.Computers and Intractability:A Guide to the Theory of NP-Completeness.San Francisco:Freeman WH and Company,1979.

[23]Liu PQ,Fan H,Zhu DM.Cluster analysis for DNA array data with missing values.Computer Science,2004,31(Suppl.):136-140(in Chinese with English abstract).

Get Citation

刘培强,朱大铭,谢青松,范辉,马绍汉.两元指纹向量聚类问题的复杂性与改进启发式算法.软件学报,2008,19(3):500-510

Copy

Article Metrics

Abstract:4805
PDF: 5164
HTML: 0
Cited by: 0

History

Received:May 18,2006
Revised:January 23,2007
Adopted:
Online:
Published:

You are the first2038623Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History