Efficient Join Query Processing Algorithm CHMJ Based on Hadoop

doi:10.3724/SP.J.1001.2012.04124

微信服务号

微信订阅号

2025-4-13- 3

Home > Archive>Volume 23, Issue 8, 2012 >2032-2041. DOI:10.3724/SP.J.1001.2012.04124

PDF HTML XML Export Cite reminder

Efficient Join Query Processing Algorithm CHMJ Based on Hadoop
DOI:
                        10.3724/SP.J.1001.2012.04124
                    
Author:
                        ZHAO Yan-RongZHAO Yan-Rong
Institute of Computing Technology, The Chinese Academy of Sciences, Beijing 100190, China;National Research Center for Intelligent Computing Systems, The Chinese Academy of Sciences, Beijing 100190, China;Graduate University, The Chinese Academy of Scienc
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
WANG Wei-PingWANG Wei-Ping
Institute of Computing Technology, The Chinese Academy of Sciences, Beijing 100190, China;National Research Center for Intelligent Computing Systems, The Chinese Academy of Sciences, Beijing 100190, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
MENG DanMENG Dan
Institute of Computing Technology, The Chinese Academy of Sciences, Beijing 100190, China;National Research Center for Intelligent Computing Systems, The Chinese Academy of Sciences, Beijing 100190, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
ZHANG Shu-BinZHANG Shu-Bin
Data Platform Department, Tencent, Inc., Shenzhen 518057, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
LI JunLI Jun
Data Platform Department, Tencent, Inc., Shenzhen 518057, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

This paper proposes a join query processing algorithm CoLocationHashMapJoin (CHMJ). First the study designs a multi-copy consistency hash algorithm. The algorithm distributes the data of tables over the cluster according to the hash values of the join property, which improves the data locality while ensure data availability. Second, based on the multi-copy consistency hash algorithm, the study proposes a parallel join query processing algorithm called HashMapJoin. HashMapJoin improves the efficiency of join query significantly. CHMJ has been used in Tencent’s data warehouse system, and plays an important role in Tencent’s daily analysis tasks. The results show that CHMJ improves the efficiency of join query processing by five times comparing to Hive.

Key words:big data;Hadoop;join query processing;HashMapJoin

Get Citation

赵彦荣,王伟平,孟丹,张书彬,李均.基于Hadoop 的高效连接查询处理算法CHMJ.软件学报,2012,23(8):2032-2041

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:May 12,2011
Revised:September 01,2011
Adopted:
Online: August 07,2012
Published:

You are the first2034785Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History