Distributed Mining of Frequent Co-occurrence Patterns across Multiple Data Streams
Author:
Affiliation:

Clc Number:

Fund Project:

National Natural Science Foundation of China (61702217, 61771230, 61772231, 61873324); Key Research and Development Program of Shandong Province (2017GGX10144, 2018GGX101048, 2017CXGC0701, 2016ZDJS01A12); Natural Science Foundation of Shandong Province of China (ZR2017MF025); Scientific and Technologic Development Program (XKY1737, XKY1734)

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    A frequent co-occurrence pattern across multiple data streams refers to a set of objects occurring in one data stream within a short time span and this set of objects appear in multiple data streams in the same fashion within another user-specified time span. Some real applications, such as discovering groups of cars that travel together using the city surveillance system, finding the people that are hanging out together based on their check-in data, and mining the hot topics by discovering groups of frequent co-occurrence keywords from social network data, can be abstracted as this problem. Due to data streams always own tremendous volumes and high arrival rates, the existing algorithms being designed for a centralized setting cannot handle mining frequent co-occurrence patterns from the large scale of streaming data with the limited computing resources. To address this problem, FCP-DM, a distributed algorithm to mine frequent co-occurrence patterns from a large number of data streams, is proposed. This algorithm first divides the data streams into segments, and then constructs a multilevel mining model in the distributed environment. This model utilizes multiple computing nodes for detecting massive volumes of data streams in a parallel pattern to discover frequent co-occurrence patterns in real-time. Finally, extensive experiments are conducted to fully evaluate the performance of the proposal.

    Reference
    Related
    Cited by
Get Citation

于自强,禹晓辉,董吉文,王琳.分布式多数据流频繁伴随模式挖掘.软件学报,2019,30(4):1078-1093

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:April 11,2017
  • Revised:June 09,2017
  • Adopted:
  • Online: April 01,2019
  • Published:
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063