Abstract:For multiple sequences alignment problem in molecular biological sequence analysis, when the input sequence number is very large, many heuristic algorithms have been proposed to improve the computation speed and the quality of alignment. An approach called MWPAlign (maximum weighted path alignment) is presented to do global multiple alignment for DNA sequences. In this method, a de Bruijn graph is used to express the input sequences information, which is recorded in the edges of the graph. As a result, a consensus-finding problem can be transformed to a maximum weighted path problem of the graph. MWPAlign obtains almost linear computation speed of the multiple sequences alignment problem. Experimental results show that the proposed algorithm is feasible, and for a large number of sequences with mutation rate lower than 5.2%, MWPALign can obtain better alignment results and has lower computational time as compared to CLUSTALW (cluster alignments weight), T-Coffee and HMMT (hidden Markov model training).