Fast Mining Algorithm of Frequent Itemset Based on Spark

doi:10.13328/j.cnki.jos.006404

微信服务号

微信订阅号

2025-4-16- 6

Home > Archive>Volume 34, Issue 5, 2023 >2446-2464. DOI:10.13328/j.cnki.jos.006404

PDF HTML XML Export Cite reminder

Fast Mining Algorithm of Frequent Itemset Based on Spark
DOI:
                        10.13328/j.cnki.jos.006404
                    
Author:
                        DING Jia-ManDING Jia-Man
Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650504, China;Yunnan Key Laboratory of Artificial Intelligence, Kunming 650504, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
LI Hai-BinLI Hai-Bin
Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650504, China;Yunnan Key Laboratory of Artificial Intelligence, Kunming 650504, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
DENG BinDENG Bin
Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650504, China;Yunnan Key Laboratory of Artificial Intelligence, Kunming 650504, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
JIA Lian-YinJIA Lian-Yin
Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650504, China;Yunnan Key Laboratory of Artificial Intelligence, Kunming 650504, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
YOU Jin-GuoYOU Jin-Guo
Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650504, China;Yunnan Key Laboratory of Artificial Intelligence, Kunming 650504, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:TP311
Fund Project:National Natural Science Foundation of China (61562054)

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

Improving the efficiency of frequent itemset mining in big data is a hot research topic at present. With the continuous growth of data volume, the computing costs of traditional frequent itemset generation algorithms remain high. Therefore, this study proposes a fast mining algorithm of frequent itemset based on Spark (Fmafibs in short). Taking advantage of bit-wise operation, a novel pattern growth strategy is designed. Firstly, the algorithm converts itemset into BitString and exploits bit-wise operation to generate candidate itemset. Secondly, to improve the processing efficiency of long BitString, a vertical grouping strategy is designed and the candidate itemset are obtained by joining the frequent itemset between different groups of same transaction, and then aggregating and filtering them to get the final frequent itemset. Fmafibs is implemented in Spark environment. The experimental results on benchmark datasets show that the proposed method is correct and it can significantly improve the mining efficiency.

Key words:frequent itemset;pattern growth;BitString;bit-wise operation;vertical grouping;Spark

Get Citation

丁家满,李海滨,邓斌,贾连印,游进国.一种基于Spark的频繁项集快速挖掘算法.软件学报,2023,34(5):2446-2464

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:August 17,2020
Revised:December 13,2020
Adopted:
Online: July 07,2022
Published: May 06,2023

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History