Fast Mining Algorithm of Frequent Itemset Based on Spark
Author:
Affiliation:

Clc Number:

TP311

Fund Project:

National Natural Science Foundation of China (61562054)

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Improving the efficiency of frequent itemset mining in big data is a hot research topic at present. With the continuous growth of data volume, the computing costs of traditional frequent itemset generation algorithms remain high. Therefore, this study proposes a fast mining algorithm of frequent itemset based on Spark (Fmafibs in short). Taking advantage of bit-wise operation, a novel pattern growth strategy is designed. Firstly, the algorithm converts itemset into BitString and exploits bit-wise operation to generate candidate itemset. Secondly, to improve the processing efficiency of long BitString, a vertical grouping strategy is designed and the candidate itemset are obtained by joining the frequent itemset between different groups of same transaction, and then aggregating and filtering them to get the final frequent itemset. Fmafibs is implemented in Spark environment. The experimental results on benchmark datasets show that the proposed method is correct and it can significantly improve the mining efficiency.

    Reference
    Related
    Cited by
Get Citation

丁家满,李海滨,邓斌,贾连印,游进国.一种基于Spark的频繁项集快速挖掘算法.软件学报,2023,34(5):2446-2464

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:August 17,2020
  • Revised:December 13,2020
  • Adopted:
  • Online: July 07,2022
  • Published: May 06,2023
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063