Comparative Study on MapReduce and Spark for Big Data Analytics

doi:10.13328/j.cnki.jos.005557

微信服务号

微信订阅号

2025-5-16- 14

Home > Archive>Volume 29, Issue 6, 2018 >1770-1791. DOI:10.13328/j.cnki.jos.005557

PDF HTML XML Export Cite reminder

Comparative Study on MapReduce and Spark for Big Data Analytics
DOI:
                        10.13328/j.cnki.jos.005557
                    
Author:
                        WU Xin-DongWU Xin-Dong
School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230009, China;School of Computing and Informatics, University of Louisiana at Lafayette, Lafayette 70504, USA
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
JI Sheng-WeiJI Sheng-Wei
School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230009, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:National Key Researh and Development Program of China (2016YFB1000901); National Natural ScienceFoundation of China (91746209); Program for Changjiang Scholars and Innovative Research Team in University (PCSIRT) of theMinistry of Education (IRT17R3)

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

This paper reviews two state-of-the-art algorithmic architectures, MapReduce and Spark, and compares them from their backgrounds, principles and application scenarios. The advantages and their corresponding limitations of these two algorithms are summarized. When dealing with non-iterative problems, MapReduce, by virtue of its task scheduling strategy and shuffle mechanisms, performs better than Spark in terms of intermediate data transfers and number of files. Spark can be used to deal with iterative problems and low latency issues, as it divides a computing task according to the dependencies between the data and the task. Compared with MapReduce, Spark can effectively reduce the number of intermediate data transmissions and the number of synchronizations, and improve the running efficiency of computing systems.

Key words:big data;MapReduce;Spark;iterative problems;non-iterative problems

Get Citation

吴信东,嵇圣硙. MapReduce与Spark用于大数据分析之比较.软件学报,2018,29(6):1770-1791

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:October 19,2017
Revised:
Adopted:
Online: February 08,2018
Published:

You are the first2044947Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History