School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230009, China;Department of Computer Science, University of Vermont, Burlington, VT 05405, USA 在期刊界中查找 在百度中查找 在本站中查找
There is a huge wealth of sequence data available in real-world applications. The task of sequential pattern mining serves to mine important patterns from the sequence data. Given a sequence S, a certain threshold, and gap constraints, this paper aims to discover frequent patterns whose supports in S are no less than the given threshold value. There are flexible wildcards in pattern P, and the number of the wildcards between any two successive elements of P fulfills the user-specified gap constraints. The study designs an efficient mining algorithm: One-Off Mining, whose mining process satisfies the One-Off condition under which each character in the given sequence can be used at most once in all occurrences of a pattern. Experiments on DNA sequences show that this method performs better in time and completeness than the related sequential pattern mining algorithms.
[1] van Belkum A, Scherer S, van Leeuwen W, Willemse W, van Alphen L, Verbrugh H. Variable number of tandem repeats inclinical strains of haemophilus influenzae. Infection and Immunity, 1997,65(12):5017-5027.
[2] Agrawal R, Srikant R. Mining sequential patterns. In: Yu PS, Chen ALP, eds. Proc. of the Int’l Conf. on Data Engineering(ICDE’95). Los Alamitos: IEEE Computer Society, 1995. 3-14. [doi: 10.1109/ICDE.1995.380415]
[3] Chen G, Wu X, Zhu X, Arslan A, He Y. Efficient string matching with wildcards and length constraints. Knowledge andInformation Systems, 2006,10(4):399-419. [doi: 10.1007/s10115-006-0016-8]
[4] Srikant R, Agrawal R. Mining sequential patterns: Generalizations and performance improvements. In: Apers P, Bouzeghoub M,Gardarin G, eds. Proc. of the Int’l Conf. on Extending Database Technology (EDBT’96). LNCS 1057, Heidelberg: Springer-Verlag,1996. 13-17. [doi: 10.1007/BFb0014140]
[5] Pei J, Han J, Mortazavi-Asl B, Chen Q, Dayal U, Hsu MC. PrefixSpan: Mining sequential patterns efficiently by prefix-projectedpattern growth. In: Georgakopoulos D, Buchmann A, eds. Proc. of the Int’l Conf. on Data Engineering (ICDE 2001). Los Alamitos:IEEE Computer Society, 2001. 215-224. [doi: 10.1109/ICDE.2001.914830]
[6] Zaki MJ. SPADE: An efficient algorithm for mining frequent sequences. Machine Learning, 2001,42(1):31-60. [doi: 10.1023/A:1007652502315]
[7] Ayres J, Gehrke J, Yiu T, Flannick J. Sequential pattern mining using a bitmap representation. In: Simoff SJ, Za-ane OR, eds. Proc.of the ACM SIGKDD. New York: ACM Press, 2002. 429-435. [doi: 10.1145/775047.775109]
[8] Zou X, Zhang W, Liu Y, Cai QS. Study on distributed sequential pattern discovery algorithm. Ruan Jian Xue Bao/Journal ofSoftware, 2005,16(7):1262-1269 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/16/1262.htm [doi: 10.1360/jos161262]
[9] Ji XN, Bailey J, Dong GZ. Mining minimal distinguishing subsequence patterns with gap constraints. In: Han JW, Wah BW,Raghavan V, Wu XD, Rastogi R, eds. Proc. of the IEEE Int’l Conf. on Data Mining (ICDM 2005). Los Alamitos: IEEE ComputerSociety, 2005. 194-201. [doi: 10.1109/ICDM.2005.96]
[10] Li C, Wang J. Efficiently mining closed subsequences with gap constraints. In: Proc. of the SIAM Int’l Conf. on Data Mining.Philadelphia: Society for Industrial Mathematics, 2008. 313-322.
[11] Zhang M, Kao B, Cheung D, Yip K. Mining periodic patterns with gap requirement from sequences. In: Ozcan F, ed. Proc. of theACM SIGMOD. New York: ACM Press, 2005. 623-633.
[12] Zhu XQ, Wu XD. Mining complex patterns across sequences with gap requirements. In: Veloso MM, ed. Proc. of the Int’l JointConf. on Artificial Intelligence (IJCAI 2007). Menlo Park: AAAI Press, 2007. 726-735.
[13] He Y, Wu X, Zhu X, Arslan AN. Mining frequent patterns with wildcards from biological sequences. In: Chang, W, Joshi, JBD,eds. Proc. of the IEEE Int’l Conf. on Information Reuse and Integration (IRI 2007). Los Alamitos: IEEE Computer Society, 2007.329-334. [doi: 10.1109/IRI.2007.4296642]
[14] Xie F, Wu X, Hu X, Gao J, Guo D, Fei Y, Hua E. Sequential pattern mining with wildcards. In: Gregoire E, ed. Proc. of the 22ndInt’l Conf. on Tools with Artificial Intelligence (ICTAI 2010). Los Alamitos: IEEE Computer Society, 2010. 241-247. [doi: 10.1109/ICTAI.2010.42]
[15] Huang Y, Wu X, Hu X, Xie F, Gao J, Wu G. Mining frequent patterns with gaps and One-Off condition. In: Muzio JC, Brent RP,eds. Proc. of the 12th IEEE Int’l Conf. on Computational Science and Engineering (CSE 2009). New York: IEEE Press, 2009.180-186. [doi: 10.1109/CSE.2009.160]
[16] Ding B, Lo D, Han J, Khoo S C. Efficient mining of closed repetitive gapped subsequences from a sequence database. In: IoannidisYE, Lee DL, Ng RT, eds. Proc. of the 25th Int’l Conf. on Data Engineering (ICDE 2009). Los Alamitos: IEEE Computer Society,2009. 1024-1035. [doi: 10.1109/ICDE.2009.104]