Research on I/O Cost of MapReduce Join

doi:10.13328/j.cnki.jos.004586

微信服务号

微信订阅号

2025-6-2- 19

Home > Archive>Volume 26, Issue 6, 2015 >1438-1456. DOI:10.13328/j.cnki.jos.004586

PDF HTML XML Export Cite reminder

Research on I/O Cost of MapReduce Join
DOI:
                        10.13328/j.cnki.jos.004586
                    
Author:
                        SONG JieSONG Jie
Software College, Northeastern University, Shenyang 110819, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
LI Tian-TianLI Tian-Tian
School of Information and Engineering, Northeastern University, Shenyang 110819, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
ZHU Zhi-LiangZHU Zhi-Liang
Software College, Northeastern University, Shenyang 110819, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
BAO Yu-BinBAO Yu-Bin
School of Information and Engineering, Northeastern University, Shenyang 110819, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
YU GeYU Ge
School of Information and Engineering, Northeastern University, Shenyang 110819, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

The exponential growth of data has posed serious challenges to the data management and analysis. Join query is a common data analysis operation, and MapReduce is a programming model implemented for parallel processing on large-scale datasets. Therefore the research on MapReduce based join algorithms and its cost model has a certain academic significance and application value. This study believes that the I/O (including the network and the local I/O) cost is the main factor affecting the performance of MapReduce based join algorithm. Furthermore, as the I/O cost is determined by the feature of both datasets and join operation, the executed plan of multi-ways join could be optimized by evaluating the I/O cost of two-ways join. In the study, an I/O cost model of two-ways join is proposed and then formally defined as a simple extension to the existing MapReduce based join algorithms, resulting in six join algorithms and their I/O cost functions through write-box analysis. In addition, an selection algorithm to find the best executed plan of multi-ways join is presented. The correctness and accuracy of the I/O cost model are validated through a series of experiments. The experiment results suggest that the I/O cost can accurately reflect the algorithm performance.

Key words:join;MapReduce;I/O cost model;query optimization

Get Citation

宋杰,李甜甜,朱志良,鲍玉斌,于戈. MapReduce连接查询的I/O代价研究.软件学报,2015,26(6):1438-1456

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:June 04,2013
Revised:January 21,2014
Adopted:
Online: June 04,2015
Published:

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History