Efficient Join Query Processing Algorithm CHMJ Based on Hadoop
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    This paper proposes a join query processing algorithm CoLocationHashMapJoin (CHMJ). First the study designs a multi-copy consistency hash algorithm. The algorithm distributes the data of tables over the cluster according to the hash values of the join property, which improves the data locality while ensure data availability. Second, based on the multi-copy consistency hash algorithm, the study proposes a parallel join query processing algorithm called HashMapJoin. HashMapJoin improves the efficiency of join query significantly. CHMJ has been used in Tencent’s data warehouse system, and plays an important role in Tencent’s daily analysis tasks. The results show that CHMJ improves the efficiency of join query processing by five times comparing to Hive.

    Reference
    Related
    Cited by
Get Citation

赵彦荣,王伟平,孟丹,张书彬,李均.基于Hadoop 的高效连接查询处理算法CHMJ.软件学报,2012,23(8):2032-2041

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:May 12,2011
  • Revised:September 01,2011
  • Adopted:
  • Online: August 07,2012
  • Published:
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063