SONG Jie
Software College, Northeastern University, Shenyang 110819, ChinaSUN Zong-Zhe
Software College, Northeastern University, Shenyang 110819, ChinaMAO Ke-Ming
Software College, Northeastern University, Shenyang 110819, ChinaBAO Yu-Bin
School of Computer Science and Engineering, Northeastern University, Shenyang 110819, ChinaYU Ge
School of Computer Science and Engineering, Northeastern University, Shenyang 110819, ChinaTP311
This paper introduces the research advance on MapReduce based big data processing platforms. Frist, twelve typical MapReduce based data processing platforms are descripted, their implementation principles and application areas are compared, and their commonalities are concluded. Second, the MapReduce based big data processing algorithms, including search algorithms, data cleansing/transformation algorithms, aggregation algorithms, join algorithms, sorting algorithms, optimization algorithms, preference query algorithms, graph algorithms, and data mining algorithms, are studied. These algorithms are classified by their MapReduce implementations, and the factors that affect their performance are analyzed. Finally, big data processing algorithms are abstracted as the out-of-core algorithms whose performance features are well analyzed. The considerations, ideas and challenges of universal optimizations on the performance of out-of-core algorithms are proposed as references for researchers. These optimizations include optimizing algorithms' I/O cost and locality, and designing incremental iterative algorithms. Comparing the current topics, such as resource allocation and task scheduling based dynamic optimizations on platform, parallelization for specific algorithms, and performance optimizations on iterative algorithms, the proposed static optimizations serve as complements that highlight new areas for the researchers.
宋杰,孙宗哲,毛克明,鲍玉斌,于戈. MapReduce大数据处理平台与算法研究进展.软件学报,2017,28(3):514-543
Copy