Distributed Index Construction for Big Data Streams

doi:10.13328/j.cnki.jos.006097

微信服务号

微信订阅号

2025-4-24- 8

Home > Archive>Volume 32, Issue 11, 2021 >3576-3595. DOI:10.13328/j.cnki.jos.006097

PDF HTML XML Export Cite reminder

Distributed Index Construction for Big Data Streams
DOI:
                        10.13328/j.cnki.jos.006097
                    
Author:
                        YANG Liang-HuaiYANG Liang-Huai
School of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
LU Chen-XiLU Chen-Xi
School of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
FAN Yu-LeiFAN Yu-Lei
School of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
ZHU Zhen-YangZHU Zhen-Yang
School of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
PAN JianPAN Jian
Zhijiang College, Zhejiang University of Technology, Shaoxing 312030, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:TP311
Fund Project:National Key Research and Development Program of China (2020YFB1707700)

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

Efficient storage and indexing of big data streams are challenging issues in the database field. By segmenting the temporal data stream into continuous time windows, a distributed master-slave index structure is proposed based on double-layer B+ tree called WB-Index. Lower B+ tree index is built on stream tuples in each time window. Upper B+ tree index is built on each successive time window. Lower B+ tree index is constructed by combining both batch loading and parallel sorting techniques. The core idea of the construction method is to slice the time window and isolate the parallelable operations from others in the time window. Sorting and data stream receiving between slices work in parallel, while the B+ tree skeleton (a B+ tree without value) construction for the time window and the merge-sorting operation are parallelized as well. These techniques effectively expedite the B+ tree construction. Due to the monotonous increasement of timestamps of time windows, a split-less method for upper B+ tree index construction is adopted to avoid the node splitting and memory movement overhead, and improve the space utilization and update efficiency. In WB-Index, data stream tuples and index are separated, and index and hotspot data are cached as much as possible to improve query efficiency. Finally, theoretic analysis and experiments have both demonstrated that WB-Index can support efficient real-time data stream writing and stream data querying.

Key words:big data;data stream;distributed index;B+ tree

Get Citation

杨良怀,卢晨曦,范玉雷,朱镇洋,潘建.面向大数据流的分布式索引构建.软件学报,2021,32(11):3576-3595

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:October 29,2019
Revised:December 25,2019
Adopted:
Online: November 05,2021
Published: November 06,2021

You are the first2037977Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History