面向时间有序事务数据的聚簇频繁模式挖掘

doi:10.13328/j.cnki.jos.007209

微信服务号

微信订阅号

2025年8月4日 11:15 星期一

首页 > 过刊浏览>2025年第36卷第5期 >2342-2361. DOI:10.13328/j.cnki.jos.007209

PDF HTML阅读 XML下载导出引用引用提醒

面向时间有序事务数据的聚簇频繁模式挖掘
DOI:
                        10.13328/j.cnki.jos.007209
                    
CSTR:
                        32375.14.jos.007209
                    
作者:
                        王少鹏王少鹏
内蒙古大学 计算机学院, 内蒙古 呼和浩特 010021;生态大数据教育部工程研究中心 (内蒙古大学), 内蒙古 呼和浩特 010021;内蒙古自治区云计算与服务软件工程实验室 (内蒙古大学), 内蒙古 呼和浩特 010021
在期刊界中查找
在百度中查找
在本站中查找
牛超煜牛超煜
内蒙古大学 计算机学院, 内蒙古 呼和浩特 010021
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:TP311
基金项目:国家自然科学基金(62066034, 62262047)

Clustering Frequent Pattern Mining for Time-ordered Transaction Data

Author:

WANG Shao-Peng
WANG Shao-Peng
College of Computer Science, Inner Mongolia University, Hohhot 010021, China;Engineering Research Center of Ecological Big Data (Inner Mongolia University), Ministry of Education, Hohhot 010021, China;Inner Mongolia Engineering Laboratory for Cloud Computing and Service Software (Inner Mongolia University), Hohhot 010021, China
在期刊界中查找
在百度中查找
在本站中查找
NIU Chao-Yu
NIU Chao-Yu
College of Computer Science, Inner Mongolia University, Hohhot 010021, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [45]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

首次对时间有序事务数据中聚簇频繁模式的挖掘问题进行研究. 为了解决Naive算法处理该问题时存在冗余运算的问题, 提出一种改进的聚簇频繁模式挖掘算法ICFPM (improved cluster frequent pattern mining). 该算法使用2种优化策略, 一方面可以利用定义的参数minCF, 有效减少挖掘结果的搜索空间, 另一方面可以参考(n–1)项集的判别结果加速聚簇频繁n项集的判别过程, 算法还使用了ICFPM-list结构来减少候选n项集的构建开销. 基于两个真实世界数据集的仿真实验证明了ICFPM算法的有效性, 与Naive算法相比, ICFPM算法在时间和空间效率方面得到了大幅度的提高, 是解决聚簇频繁模式挖掘的有效方法.

关键词:时间有序事务数据;聚簇;频繁模式;数据挖掘;向下闭包

Abstract:

In this study, the problem of mining cluster frequent patterns in time-ordered transaction data is discussed for the first time. To deal with redundant operations when the Naive algorithm solves this problem, the improved cluster frequent pattern mining (ICFPM) algorithm is proposed. The algorithm uses two optimization strategies. On the one hand, it can use the defined parameter minCF to effectively reduce the search space of mining results; on the other hand, it can refer to the discriminative results of (n–1)-itemsets to accelerate the discriminative process of cluster frequent n-itemset. The algorithm also applies the ICFPM-list structure to reduce the overhead of the candidate n-itemsets construction. Simulation experiments based on two real-world datasets demonstrate the effectiveness of the ICFPM algorithm. Compared with the Naive algorithm, the ICFPM algorithm improves substantially in terms of time and space efficiency, which makes it an effective method for solving clustered frequent pattern mining.

Key words:time-ordered transaction data;clustering;frequent pattern;data mining;downward closure

参考文献

[1] Agrawal R, Imieliński T, Swami A. Mining association rules between sets of items in large databases. In: Proc. of the 1993 ACM SIGMOD Int’l Conf. on Management of Data. Washington: ACM, 1993. 207–216. [doi: 10.1145/170035.170072]

[2] Agrawal R, Srikant R. Fast algorithms for mining association rules in large databases. In: Proc. of the 20th Int’l Conf. on Very Large Data Bases. San Francisco: Morgan Kaufmann Publishers Inc., 1994. 487–499.

[3] Han JW, Pei J, Yin YW. Mining frequent patterns without candidate generation. ACM SIGMOD Record, 2000, 29(2): 1–12.

[4] Zaki MJ. Scalable algorithms for association mining. IEEE Trans. on Knowledge and Data Engineering, 2000, 12(3): 372–390.

[5] 郭宇红, 童云海, 唐世渭, 杨冬青. 基于FP-Tree的反向频繁项集挖掘. 软件学报, 2008, 19(2): 338–350. https://jos.org.cn/jos/article/abstract/20080215

Guo YH, Tong YH, Tang SW, Yang DQ. Inverse frequent itemset mining based on FP-Tree. Ruan Jian Xue Bao/Journal of Software, 2008, 19(2): 338–350 (in Chinese with English abstract). https://jos.org.cn/jos/article/abstract/20080215

[6] 丁家满, 李海滨, 邓斌, 贾连印, 游进国. 一种基于Spark的频繁项集快速挖掘算法. 软件学报, 2023, 34(5): 2446–2464. http://www.jos.org.cn/1000-9825/6404.htm

Ding JM, Li HB, Deng B, Jia LY, You JG. Fast mining algorithm of frequent itemset based on spark. Ruan Jian Xue Bao/Journal of Software, 2023, 34(5): 2446–2464 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6404.htm

[7] Fournier-Viger P, Yang P, Kiran RU, Ventura S, Luna JM. Mining local periodic patterns in a discrete sequence. Information Sciences, 2021, 544: 519–548.

[8] Fournier-Viger P, Tseng VS. Mining top-K non-redundant association rules. In: Proc. of the 20th Int’l Symp. on Methodologies for Intelligent Systems. Macao: Springer, 2012. 31–40. [doi: 10.1007/978-3-642-34624-8_4]

[9] Agouti T. Graph-based modeling using association rule mining to detect influential users in social networks. Expert Systems with Applications, 2022, 202: 117436.

[10] Yao QY, Yang H, Bao BW, Yu A, Zhang J, Cheriet M. Core and spectrum allocation based on association rules mining in spectrally and spatially elastic optical networks. IEEE Trans. on Communications, 2021, 69(8): 5299–5311.

[11] Agrawal R, Srikant R. Mining sequential patterns. In: Proc. of the 11th Int’l Conf. on Data Engineering. Taipei: IEEE, 1995. 3–14. [doi: 10.1109/ICDE.1995.380415]

[12] Han J, Pei JW, Mortazavi-Asl B, Pinto H, Chen QM, Dayal U, Hsu MC. PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth. In: Proc. of the 17th Int’l Conf. on Data Engineering. Heidelberg: IEEE, 2001. 215–224.

[13] Huynh HM, Nguyen LTT, Pham NN, Oplatková ZK, Yun U, Vo B. An efficient method for mining sequential patterns with indices. Knowledge-based Systems, 2022, 239: 107946.

[14] Zhang W, Yoshida T, Tang XJ, Wang Q. Text clustering using frequent itemsets. Knowledge-based Systems, 2010, 23(5): 379–388.

[15] Zhang LK, Yang GF. Cluster analysis of PM_2.5 pollution in China using the frequent itemset clustering approach. Environmental Research, 2022, 204: 112009.

[16] Dong GZ, Li JY. Efficient mining of emerging patterns: Discovering trends and differences. In: Proc. of the 5th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining. San Diego: ACM, 1999. 43–52. [doi: 10.1145/312129.312191]

[17] Zhang CS, Liu CC, Zhang XL, Almpanidis G. An up-to-date comparison of state-of-the-art classification algorithms. Expert Systems with Applications, 2017, 82: 128–150.

[18] 颜跃进, 李舟军, 陈火旺. 基于FP-Tree有效挖掘最大频繁项集. 软件学报, 2005, 16(2): 215–222. https://jos.org.cn/jos/article/abstract/20050206

Yan YJ, Li ZJ, Chen HW. Efficiently mining of maximal frequent item sets based on FP-Tree. Ruan Jian Xue Bao/Journal of Software, 2005, 16(2): 215–222 (in Chinese with English abstract). https://jos.org.cn/jos/article/abstract/20050206

[19] Li HF, Zhang N. A simple but effective stream maximal frequent itemset mining algorithm. In: Proc. of the 7th Int’l Conf. on Computational Intelligence and Security. Sanya: IEEE, 2011. 1268–1272.

[20] 王少鹏, 闻英友, 赵宏. 滑动窗口下数据流完全加权最大频繁项集挖掘. 东北大学学报(自然科学版), 2016, 37(7): 931–936.

Wang SP, Wen YY, Zhao H. Mining full weighted maxinal frequent itensets based on sliding window over data stream. Journal of Northeastern University (Natural Science), 2016, 37(7): 931–936 (in Chinese with English abstract).

[21] Pasquier N, Bastide Y, Taouil R, Lakhal L. Discovering frequent closed itemsets for association rules. In: Proc. of the 7th Int’l Conf. on Database Theory. Jerusalem: Springer, 1999. 398–416. [doi: 10.1007/3-540-49257-7_25]

[22] Pei J, Han J, Mao R. CLOSET: An efficient algorithm for mining frequent closed itemsets. In: Proc. of the 2000 ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery. New York: ACM, 2000. 21–30.

[23] Liu JQ, Ye ZS, Yang XC, Wang XL, Shen LJ, Jiang XN. Efficient strategies for incremental mining of frequent closed itemsets over data streams. Expert Systems with Applications, 2022, 191: 116220.

[24] Ahmed CF, Tanbeer SK, Jeong BS, Lee YK. Efficient tree structures for high utility pattern mining in incremental databases. IEEE Trans. on Knowledge and Data Engineering, 2009, 21(12): 1708–1721.

[25] Zida S, Fournier-Viger P, Lin JCW, Wu CW, Tseng VS. EFIM: A fast and memory efficient algorithm for high-utility itemset mining. Knowledge and Information Systems, 2017, 51(2): 595–625.

[26] Duong QH, Fournier-Viger P, Ramampiaro H, Nørvåg K, Dam TL. Efficient high utility itemset mining using buffered utility-lists. Applied Intelligence, 2018, 48(7): 1859–1877.

[27] Tanbeer SK, Ahmed CF, Jeong BS, Lee YK. Discovering periodic-frequent patterns in transactional databases. In: Proc. of the 13th Pacific-Asia Conf. on Knowledge Discovery and Data Mining. Bangkok: Springer, 2009. 242–253. [doi: 10.1007/978-3-642-01307-2_24]

[28] Kiran RU, Venkatesh JN, Fournier-Viger P, Toyoda M, Reddy PK, Kitsuregawa M. Discovering periodic patterns in non-uniform temporal databases. In: Proc. of the 21st Pacific-Asia Conf. on Knowledge Discovery and Data Mining. Jeju: Springer, 2017. 604–617.

[29] Venkatesh J, Kiran RU, Krishna RP, Kitsuregawa M. Discovering periodic-correlated patterns in temporal databases. In: Hameurlain A, Wagner R, Hartmann S, Ma H, eds. Trans. on Large-scale Data- and Knowledge-centered Systems XXXVIII: Special Issue on Database- and Expert-systems Applications. Berlin: Springer, 2018. 146–172. [doi: 10.1007/978-3-662-58384-5_6]

[30] Rage UK, Alampally A, Chennupati S, Toyoda M, Reddy PK, Kitsuregawa M, Reddy M. Finding periodic-frequent patterns in temporal databases using periodic summaries. Data Science and Pattern Recognition, 2019, 3(2): 24–46.

[31] Krzywicki A, Mahidadia A, Bain M. Discovering periodicity in locally repeating patterns. In: Proc. of the 9th IEEE Int’l Conf. on Data Science and Advanced Analytics. Shenzhen: IEEE, 2022. 1–10. [doi: 10.1109/DSAA54385.2022.10032435]

[32] Kiran RU, Veena P, Ravikumar P, Saideep C, Zettsu K, Shang HC, Toyoda M, Kitsuregawa M, Reddy PK. Efficient discovery of partial periodic patterns in large temporal databases. Electronics, 2022, 11(10): 1523.

[33] Upadhya KJ, Paleja A, Geetha M, Rao BD, Chhabra MS. Finding partial periodic and rare periodic patterns in temporal databases. IEEE Access, 2023, 11: 92242–92257.

[34] Tanbeer SK, Ahmed CF, Jeong BS, Lee YK. Mining regular patterns in transactional databases. IEICE Trans. on Information and Systems, 2008, E91.D(11): 2568–2577.

[35] Rashid MM, Karim MR, Jeong BS, Choi HJ. Efficient mining regularly frequent patterns in transactional databases. In: Proc. of the 17th Int’l Conf. on Database Systems for Advanced Applications. Busan: Springer, 2012. 258–271. [doi: 10.1007/978-3-642-29038-1_20]

[36] Amphawan K, Lenca P, Surarerks A. Mining top-k regular-frequent itemsets using database partitioning and support estimation. Expert Systems with Applications, 2012, 39(2): 1924–1936.

[37] Kumar GV, Kumari VV. MaRFI: Maximal regular frequent itemset mining using a pair of transaction-ids. Int’l Journal of Computer Science & Engineering Technology, 2013, 4(7): 2229–3345.

[38] Amphawan K, Lenca P. Mining top-k frequent-regular closed patterns. Expert Systems with Applications, 2015, 42(21): 7882–7894.

[39] Rehman SU, Khan MA, Nabi HU, Ali S, Alnazzawi N, Khan S. TKIFRPM: A novel approach for topmost-k identical frequent regular patterns mining from incremental datasets. Applied Sciences, 2023, 13(1): 654.

[40] Chen GS, Li ZS. Discovering periodic cluster patterns in event sequence databases. Applied Intelligence, 2022, 52(13): 15387–15404.

[41] Ester M, Kriegel HP, Sander J, Xu XW. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proc. of the 2nd Int’l Conf. on Knowledge Discovery and Data Mining (KDD 1996). Portland: AAAI Press, 1996. 226–231.

引用本文

王少鹏,牛超煜.面向时间有序事务数据的聚簇频繁模式挖掘.软件学报,2025,36(5):2342-2361

复制

文章指标

点击次数:218
下载次数: 1917
HTML阅读次数: 204
引用次数: 0

历史

收稿日期:2023-12-11
最后修改日期:2024-03-13
录用日期:
在线发布日期: 2024-06-20
出版日期: 2025-05-06

微信服务号

微信订阅号

引用本文

相关视频

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

相关视频

分享

微信扫一扫：分享

文章指标

历史

文章二维码