 |
|
|
|
 |
 |
 |
|
 |
|
 |
|
|
宋杰,孙宗哲,毛克明,鲍玉斌,于戈.MapReduce大数据处理平台与算法研究进展.软件学报,2017,28(3):514-543 |
MapReduce大数据处理平台与算法研究进展 |
Research Advance on MapReduce Based Big Data Processing Platforms and Algorithms |
投稿时间:2016-08-01 修订日期:2016-09-14 |
DOI:10.13328/j.cnki.jos.005169 |
中文关键词: 大数据 MapReduce 外存算法 大数据处理 算法性能优化 |
英文关键词:big data MapReduce out-of-core algorithm big data processing performace optimization on algorithms |
基金项目:国家自然科学基金(61672143,61433008,61402090,61502090) |
|
摘要点击次数: 3165 |
全文下载次数: 2288 |
中文摘要: |
综述了近年来基于MapReduce编程模型的大数据处理平台与算法的研究进展.首先介绍了12个典型的基于MapReduce的大数据处理平台,分析对比它们的实现原理和适用场景,抽象其共性;随后介绍基于MapReduce的大数据分析算法,包括搜索算法、数据清洗/变换算法、聚集算法、连接算法、排序算法、偏好查询、最优化算法、图算法、数据挖掘算法,将这些算法按照MapReduce实现方式分类,分析影响算法性能的因素;最后,将大数据处理算法抽象为外存算法,并对外存算法的特征加以梳理,提出了普适的外存算法性能优化方法的研究思路和问题,以供研究人员参考.具体包括优化外存算法的磁盘I/O、优化外存算法的局部性以及设计增量式迭代算法.现有的大数据处理平台和算法研究多集中在基于资源分配和任务调度的平台动态性能优化、特定算法并行化、特定算法性能优化等领域,所提出的外存算法性能优化属于静态优化方法,是现有研究的良好补充,为研究人员提供了广阔的研究空间. |
英文摘要: |
This paper introduces the research advance on MapReduce based big data processing platforms. Frist, twelve typical MapReduce based data processing platforms are descripted, their implementation principles and application areas are compared, and their commonalities are concluded. Second, the MapReduce based big data processing algorithms, including search algorithms, data cleansing/transformation algorithms, aggregation algorithms, join algorithms, sorting algorithms, optimization algorithms, preference query algorithms, graph algorithms, and data mining algorithms, are studied. These algorithms are classified by their MapReduce implementations, and the factors that affect their performance are analyzed. Finally, big data processing algorithms are abstracted as the out-of-core algorithms whose performance features are well analyzed. The considerations, ideas and challenges of universal optimizations on the performance of out-of-core algorithms are proposed as references for researchers. These optimizations include optimizing algorithms' I/O cost and locality, and designing incremental iterative algorithms. Comparing the current topics, such as resource allocation and task scheduling based dynamic optimizations on platform, parallelization for specific algorithms, and performance optimizations on iterative algorithms, the proposed static optimizations serve as complements that highlight new areas for the researchers. |
HTML 下载PDF全文 查看/发表评论 下载PDF阅读器 |
|
|
|
|
|
|
 |
|
|
|
|
 |
|
 |
|
 |
|