Online Join Method for Skewed Data Streams

doi:10.13328/j.cnki.jos.005440

微信服务号

微信订阅号

2025-4-6- 10

Home > Archive>Volume 29, Issue 3, 2018 >869-882. DOI:10.13328/j.cnki.jos.005440

PDF HTML XML Export Cite reminder

Online Join Method for Skewed Data Streams
DOI:
                        10.13328/j.cnki.jos.005440
                    
Author:
                        WANG Chun-KaiWANG Chun-Kai
School of Information, Renmin University of China, Beijing 100872, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
MENG Xiao-FengMENG Xiao-Feng
School of Information, Renmin University of China, Beijing 100872, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:TP311
Fund Project:National Natural Science Foundation of China (61532016, 61379050, 61532010, 91646203, 61762082);The National Key Research and Development Program of China (2016YFB1000602, 2016YFB1000603);The Research Funds of Renmin University (11XNL010);the Science and Technology Opening up Cooperation project of He'nan Province (172106000077)

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

Scalable distributed join processing in a parallel environment requires a partitioning policy to transfer data while minimizing the size of migrated statement and the number of communicated messages. Online theta-joins over data streams are more computationally expensive and impose higher memory requirement in distributed data stream management systems (DDSMS) than standalone database management systems (DBMS). The complete bipartite graph-based model can support distributed stream joins, and has the characteristics of memory-efficiency, elasticity and scalability. This is because each relation is stored in its corresponding processing units without data replicas and the units are independent of each other. However, due to the instability of data stream rate and the imbalance of attribute value distribution, the online theta-joins over skewed data streams can lead to the load imbalance of cluster. In this case, the bipartite graph-based model is unable to allocate the query nodes dynamically, and requires to set parameters about the grouping manually. The more serious issue is that the effect of the full-history join is worse. In this paper, a framework for handling skewed stream join is presented for enhancing the adaptability of the join model and minimizing the system cost based on the varying workloads. The proposal includes a mixed key-based and tuple-based partitioning scheme to handle skewed data in each side of the bipartite graph-based model, a strategy for redistribution of query nodes in two sides of this model, and a migration algorithm about state consistency to support full-history joins and adaptive resource management. Experiments with synthetic data and real data show that the presented method can effectively handle skewed data streams and improve the throughput of DDSMS, and it also effective especially on reducing the operational cost in the cloud environment.

Key words:distributed data stream management system;online join;data skew;state migration;bipartite graph-based join model

Get Citation

王春凯,孟小峰.应对倾斜数据流在线连接方法.软件学报,2018,29(3):869-882

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:July 31,2017
Revised:September 05,2017
Adopted:
Online: December 05,2017
Published:

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History