2007, 18(11):2766-2781.
Abstract:
DNA sequence is one of the basic and important data among biological data.Researching DNA sequence data and then comprehending life essential is a necessary task in post-genomie era.At present,data mining technique is one of the most efficient data analysis means,which finds out information hidden in data.It has also become main data analysis technique adopted in Bioinformatics.It has been applied in DNA sequence analysis, which has got wide attention and rapid development.And considerable research achievements have emerged. Provides an overview of research progress in DNA sequence data mining field.In more detail,it proposes three research phases including statistics-based data mining methods application,general data mining methods application,and specialized DNA sequence-oriented data mining methods design,and then elaborates that sequence similarity is foundation of DNA sequence data mining technique.It also analyzes and comments some key techniques in this field by combining with biological background,such as DNA sequential pattern,association, clustering,classification and outlier mining.Finally,future work and open issues are given,including the research of a novel storage model and index methods,the design of data mining algorithm based on biological domain knowledge.