面向时间有序事务数据的聚簇频繁模式挖掘
作者:
中图分类号:

TP311

基金项目:

国家自然科学基金(62066034, 62262047)


Clustering Frequent Pattern Mining for Time-ordered Transaction Data
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [45]
  • | |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    首次对时间有序事务数据中聚簇频繁模式的挖掘问题进行研究. 为了解决Naive算法处理该问题时存在冗余运算的问题, 提出一种改进的聚簇频繁模式挖掘算法ICFPM (improved cluster frequent pattern mining). 该算法使用2种优化策略, 一方面可以利用定义的参数minCF, 有效减少挖掘结果的搜索空间, 另一方面可以参考(n–1)项集的判别结果加速聚簇频繁n项集的判别过程, 算法还使用了ICFPM-list结构来减少候选n项集的构建开销. 基于两个真实世界数据集的仿真实验证明了ICFPM算法的有效性, 与Naive算法相比, ICFPM算法在时间和空间效率方面得到了大幅度的提高, 是解决聚簇频繁模式挖掘的有效方法.

    Abstract:

    In this study, the problem of mining cluster frequent patterns in time-ordered transaction data is discussed for the first time. To deal with redundant operations when the Naive algorithm solves this problem, the improved cluster frequent pattern mining (ICFPM) algorithm is proposed. The algorithm uses two optimization strategies. On the one hand, it can use the defined parameter minCF to effectively reduce the search space of mining results; on the other hand, it can refer to the discriminative results of (n–1)-itemsets to accelerate the discriminative process of cluster frequent n-itemset. The algorithm also applies the ICFPM-list structure to reduce the overhead of the candidate n-itemsets construction. Simulation experiments based on two real-world datasets demonstrate the effectiveness of the ICFPM algorithm. Compared with the Naive algorithm, the ICFPM algorithm improves substantially in terms of time and space efficiency, which makes it an effective method for solving clustered frequent pattern mining.

    参考文献
    [1] Agrawal R, Imieliński T, Swami A. Mining association rules between sets of items in large databases. In: Proc. of the 1993 ACM SIGMOD Int’l Conf. on Management of Data. Washington: ACM, 1993. 207–216. [doi: 10.1145/170035.170072]
    [2] Agrawal R, Srikant R. Fast algorithms for mining association rules in large databases. In: Proc. of the 20th Int’l Conf. on Very Large Data Bases. San Francisco: Morgan Kaufmann Publishers Inc., 1994. 487–499.
    [3] Han JW, Pei J, Yin YW. Mining frequent patterns without candidate generation. ACM SIGMOD Record, 2000, 29(2): 1–12.
    [4] Zaki MJ. Scalable algorithms for association mining. IEEE Trans. on Knowledge and Data Engineering, 2000, 12(3): 372–390.
    [5] 郭宇红, 童云海, 唐世渭, 杨冬青. 基于FP-Tree的反向频繁项集挖掘. 软件学报, 2008, 19(2): 338–350. https://jos.org.cn/jos/article/abstract/20080215
    Guo YH, Tong YH, Tang SW, Yang DQ. Inverse frequent itemset mining based on FP-Tree. Ruan Jian Xue Bao/Journal of Software, 2008, 19(2): 338–350 (in Chinese with English abstract). https://jos.org.cn/jos/article/abstract/20080215
    [6] 丁家满, 李海滨, 邓斌, 贾连印, 游进国. 一种基于Spark的频繁项集快速挖掘算法. 软件学报, 2023, 34(5): 2446–2464. http://www.jos.org.cn/1000-9825/6404.htm
    Ding JM, Li HB, Deng B, Jia LY, You JG. Fast mining algorithm of frequent itemset based on spark. Ruan Jian Xue Bao/Journal of Software, 2023, 34(5): 2446–2464 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6404.htm
    [7] Fournier-Viger P, Yang P, Kiran RU, Ventura S, Luna JM. Mining local periodic patterns in a discrete sequence. Information Sciences, 2021, 544: 519–548.
    [8] Fournier-Viger P, Tseng VS. Mining top-K non-redundant association rules. In: Proc. of the 20th Int’l Symp. on Methodologies for Intelligent Systems. Macao: Springer, 2012. 31–40. [doi: 10.1007/978-3-642-34624-8_4]
    [9] Agouti T. Graph-based modeling using association rule mining to detect influential users in social networks. Expert Systems with Applications, 2022, 202: 117436.
    [10] Yao QY, Yang H, Bao BW, Yu A, Zhang J, Cheriet M. Core and spectrum allocation based on association rules mining in spectrally and spatially elastic optical networks. IEEE Trans. on Communications, 2021, 69(8): 5299–5311.
    [11] Agrawal R, Srikant R. Mining sequential patterns. In: Proc. of the 11th Int’l Conf. on Data Engineering. Taipei: IEEE, 1995. 3–14. [doi: 10.1109/ICDE.1995.380415]
    [12] Han J, Pei JW, Mortazavi-Asl B, Pinto H, Chen QM, Dayal U, Hsu MC. PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth. In: Proc. of the 17th Int’l Conf. on Data Engineering. Heidelberg: IEEE, 2001. 215–224.
    [13] Huynh HM, Nguyen LTT, Pham NN, Oplatková ZK, Yun U, Vo B. An efficient method for mining sequential patterns with indices. Knowledge-based Systems, 2022, 239: 107946.
    [14] Zhang W, Yoshida T, Tang XJ, Wang Q. Text clustering using frequent itemsets. Knowledge-based Systems, 2010, 23(5): 379–388.
    [15] Zhang LK, Yang GF. Cluster analysis of PM2.5 pollution in China using the frequent itemset clustering approach. Environmental Research, 2022, 204: 112009.
    [16] Dong GZ, Li JY. Efficient mining of emerging patterns: Discovering trends and differences. In: Proc. of the 5th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining. San Diego: ACM, 1999. 43–52. [doi: 10.1145/312129.312191]
    [17] Zhang CS, Liu CC, Zhang XL, Almpanidis G. An up-to-date comparison of state-of-the-art classification algorithms. Expert Systems with Applications, 2017, 82: 128–150.
    [18] 颜跃进, 李舟军, 陈火旺. 基于FP-Tree有效挖掘最大频繁项集. 软件学报, 2005, 16(2): 215–222. https://jos.org.cn/jos/article/abstract/20050206
    Yan YJ, Li ZJ, Chen HW. Efficiently mining of maximal frequent item sets based on FP-Tree. Ruan Jian Xue Bao/Journal of Software, 2005, 16(2): 215–222 (in Chinese with English abstract). https://jos.org.cn/jos/article/abstract/20050206
    [19] Li HF, Zhang N. A simple but effective stream maximal frequent itemset mining algorithm. In: Proc. of the 7th Int’l Conf. on Computational Intelligence and Security. Sanya: IEEE, 2011. 1268–1272.
    [20] 王少鹏, 闻英友, 赵宏. 滑动窗口下数据流完全加权最大频繁项集挖掘. 东北大学学报(自然科学版), 2016, 37(7): 931–936.
    Wang SP, Wen YY, Zhao H. Mining full weighted maxinal frequent itensets based on sliding window over data stream. Journal of Northeastern University (Natural Science), 2016, 37(7): 931–936 (in Chinese with English abstract).
    [21] Pasquier N, Bastide Y, Taouil R, Lakhal L. Discovering frequent closed itemsets for association rules. In: Proc. of the 7th Int’l Conf. on Database Theory. Jerusalem: Springer, 1999. 398–416. [doi: 10.1007/3-540-49257-7_25]
    [22] Pei J, Han J, Mao R. CLOSET: An efficient algorithm for mining frequent closed itemsets. In: Proc. of the 2000 ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery. New York: ACM, 2000. 21–30.
    [23] Liu JQ, Ye ZS, Yang XC, Wang XL, Shen LJ, Jiang XN. Efficient strategies for incremental mining of frequent closed itemsets over data streams. Expert Systems with Applications, 2022, 191: 116220.
    [24] Ahmed CF, Tanbeer SK, Jeong BS, Lee YK. Efficient tree structures for high utility pattern mining in incremental databases. IEEE Trans. on Knowledge and Data Engineering, 2009, 21(12): 1708–1721.
    [25] Zida S, Fournier-Viger P, Lin JCW, Wu CW, Tseng VS. EFIM: A fast and memory efficient algorithm for high-utility itemset mining. Knowledge and Information Systems, 2017, 51(2): 595–625.
    [26] Duong QH, Fournier-Viger P, Ramampiaro H, Nørvåg K, Dam TL. Efficient high utility itemset mining using buffered utility-lists. Applied Intelligence, 2018, 48(7): 1859–1877.
    [27] Tanbeer SK, Ahmed CF, Jeong BS, Lee YK. Discovering periodic-frequent patterns in transactional databases. In: Proc. of the 13th Pacific-Asia Conf. on Knowledge Discovery and Data Mining. Bangkok: Springer, 2009. 242–253. [doi: 10.1007/978-3-642-01307-2_24]
    [28] Kiran RU, Venkatesh JN, Fournier-Viger P, Toyoda M, Reddy PK, Kitsuregawa M. Discovering periodic patterns in non-uniform temporal databases. In: Proc. of the 21st Pacific-Asia Conf. on Knowledge Discovery and Data Mining. Jeju: Springer, 2017. 604–617.
    [29] Venkatesh J, Kiran RU, Krishna RP, Kitsuregawa M. Discovering periodic-correlated patterns in temporal databases. In: Hameurlain A, Wagner R, Hartmann S, Ma H, eds. Trans. on Large-scale Data- and Knowledge-centered Systems XXXVIII: Special Issue on Database- and Expert-systems Applications. Berlin: Springer, 2018. 146–172. [doi: 10.1007/978-3-662-58384-5_6]
    [30] Rage UK, Alampally A, Chennupati S, Toyoda M, Reddy PK, Kitsuregawa M, Reddy M. Finding periodic-frequent patterns in temporal databases using periodic summaries. Data Science and Pattern Recognition, 2019, 3(2): 24–46.
    [31] Krzywicki A, Mahidadia A, Bain M. Discovering periodicity in locally repeating patterns. In: Proc. of the 9th IEEE Int’l Conf. on Data Science and Advanced Analytics. Shenzhen: IEEE, 2022. 1–10. [doi: 10.1109/DSAA54385.2022.10032435]
    [32] Kiran RU, Veena P, Ravikumar P, Saideep C, Zettsu K, Shang HC, Toyoda M, Kitsuregawa M, Reddy PK. Efficient discovery of partial periodic patterns in large temporal databases. Electronics, 2022, 11(10): 1523.
    [33] Upadhya KJ, Paleja A, Geetha M, Rao BD, Chhabra MS. Finding partial periodic and rare periodic patterns in temporal databases. IEEE Access, 2023, 11: 92242–92257.
    [34] Tanbeer SK, Ahmed CF, Jeong BS, Lee YK. Mining regular patterns in transactional databases. IEICE Trans. on Information and Systems, 2008, E91.D(11): 2568–2577.
    [35] Rashid MM, Karim MR, Jeong BS, Choi HJ. Efficient mining regularly frequent patterns in transactional databases. In: Proc. of the 17th Int’l Conf. on Database Systems for Advanced Applications. Busan: Springer, 2012. 258–271. [doi: 10.1007/978-3-642-29038-1_20]
    [36] Amphawan K, Lenca P, Surarerks A. Mining top-k regular-frequent itemsets using database partitioning and support estimation. Expert Systems with Applications, 2012, 39(2): 1924–1936.
    [37] Kumar GV, Kumari VV. MaRFI: Maximal regular frequent itemset mining using a pair of transaction-ids. Int’l Journal of Computer Science & Engineering Technology, 2013, 4(7): 2229–3345.
    [38] Amphawan K, Lenca P. Mining top-k frequent-regular closed patterns. Expert Systems with Applications, 2015, 42(21): 7882–7894.
    [39] Rehman SU, Khan MA, Nabi HU, Ali S, Alnazzawi N, Khan S. TKIFRPM: A novel approach for topmost-k identical frequent regular patterns mining from incremental datasets. Applied Sciences, 2023, 13(1): 654.
    [40] Chen GS, Li ZS. Discovering periodic cluster patterns in event sequence databases. Applied Intelligence, 2022, 52(13): 15387–15404.
    [41] Ester M, Kriegel HP, Sander J, Xu XW. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proc. of the 2nd Int’l Conf. on Knowledge Discovery and Data Mining (KDD 1996). Portland: AAAI Press, 1996. 226–231.
    相似文献
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

王少鹏,牛超煜.面向时间有序事务数据的聚簇频繁模式挖掘.软件学报,,():1-20

复制
分享
文章指标
  • 点击次数:159
  • 下载次数: 1655
  • HTML阅读次数: 0
  • 引用次数: 0
历史
  • 收稿日期:2023-12-11
  • 最后修改日期:2024-03-13
  • 在线发布日期: 2024-06-20
文章二维码
您是第19799526位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号