Jump Filter: Dynamic Sketch Design for Big Data Governance
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    With the rapid development of information technology, the volume of data maintains an exponential growth, and the value of data is hard to mine. It brings significant challenges to the efficient management and control of each link in the data life cycle, such as data collection, cleaning, storage, and sharing. Sketch uses a hash table/matrix/bit vector to track the core characteristics of data, such as frequency, cardinality, membership, etc. This mechanism makes sketch itself metadata which has been widely used in the sharing, transmission, update and other scenarios. The rapid flow characteristic of big data has spawned the dynamic sketches. The existing dynamic sketches have the advantage of expanding or shrinking in capacity with the size of the data stream by dynamically maintaining a list of probabilistic data structures in a chain or tree structure. However, there are defects of excessive space overhead and time overhead increasing with the increase of the dataset cardinality. This study designs a dynamic sketch for big data governance based on the advanced jump consistent hash. This method can simultaneously realize the space overhead that grows linearly with the dataset cardinality and the constant time overhead of data processing and analysis, effectively supporting the demanding big data processing and analysis tasks for big data governance. The validity and efficiency of the proposed method are verified by comparing it with traditional methods on various datasets, including synthetic and natural datasets.

    Reference
    Related
    Cited by
Get Citation

符鹏涛,罗来龙,郭得科,赵翔,李尚森,王怀民.跳跃滤波:一种面向大数据治理的动态数据摘要设计.软件学报,2023,34(3):1193-1212

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:May 14,2022
  • Revised:July 29,2022
  • Adopted:
  • Online: October 26,2022
  • Published:
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063