Survey on JVM Optimization for Big Data Processing Frameworks

doi:10.13328/j.cnki.jos.006502

微信服务号

微信订阅号

2025-4-16- 4

Home > Archive>Volume 34, Issue 1, 2023 >463-488. DOI:10.13328/j.cnki.jos.006502

PDF HTML XML Export Cite reminder

Survey on JVM Optimization for Big Data Processing Frameworks
DOI:
                        10.13328/j.cnki.jos.006502
                    
Author:
                        WANG Yi-ChengWANG Yi-Cheng
State Key Laboratory of Computer Science (Institute of Software, Chinese Academy of Sciences), Beijing 100190, China;University of Chinese Academy of Sciences, Beijing 100049, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
ZENG Hong-BinZENG Hong-Bin
State Key Laboratory of Computer Science (Institute of Software, Chinese Academy of Sciences), Beijing 100190, China;University of Chinese Academy of Sciences, Beijing 100049, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
XU Li-JieXU Li-Jie
State Key Laboratory of Computer Science (Institute of Software, Chinese Academy of Sciences), Beijing 100190, China;University of Chinese Academy of Sciences, Beijing 100049, China;Nanjing Institute of Software Technology, Nanjing 211135, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
WANG WeiWANG Wei
State Key Laboratory of Computer Science (Institute of Software, Chinese Academy of Sciences), Beijing 100190, China;University of Chinese Academy of Sciences, Beijing 100049, China;Nanjing Institute of Software Technology, Nanjing 211135, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
WEI JunWEI Jun
State Key Laboratory of Computer Science (Institute of Software, Chinese Academy of Sciences), Beijing 100190, China;University of Chinese Academy of Sciences, Beijing 100049, China;Nanjing Institute of Software Technology, Nanjing 211135, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
HUANG TaoHUANG Tao
State Key Laboratory of Computer Science (Institute of Software, Chinese Academy of Sciences), Beijing 100190, China;University of Chinese Academy of Sciences, Beijing 100049, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

Nowadays, the big data processing frameworks such as Hadoop and Spark have been widely used for data processing and analysis in industry and academia. These big data processing frameworks adopt the distributed architecture, generally developed in object-oriented languages like Java and Scala. These frameworks take Java virtual machine (JVM) as the runtime environment on cluster nodes to execute computing tasks, i.e., relying on JVM’s automatic memory management mechanism to allocate and reclaim data objects. However, current JVMs are not designed for the big data processing frameworks, leading to many problems such as long garbage collection (GC) time and high cost of data serialization and deserialization. As reported by users and researchers, GC time can take even more than 50% of the overall application execution time in some cases. Therefore, JVM memory management problem has become the performance bottleneck of the big data processing frameworks. This study systematically reviews the recent JVM optimization research work for big data processing frameworks. The contributions include the following three outcomes. First, the root causes of the performance degradation of big data applications when executed in JVM are summarized. Second, the existing JVM optimization techniques are summarized for big data processing frameworks. These methods are also classified into categories, the advantages and disadvantages of each are compared and analyzed, including the method’s optimization effects, application scopes, and burdens on users. Finally, some future JVM optimization directions are proposed, which will help the performance improvement of big data processing frameworks.

Key words:big data system;Java virtual machine (JVM);distributed system;automatic memory management

Get Citation

汪钇丞,曾鸿斌,许利杰,王伟,魏峻,黄涛.面向大数据处理框架的JVM优化技术综述.软件学报,2023,34(1):463-488

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:January 18,2021
Revised:April 29,2021
Adopted:
Online: November 24,2021
Published: January 06,2023

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History