Survey on JVM Optimization for Big Data Processing Frameworks
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Nowadays, the big data processing frameworks such as Hadoop and Spark have been widely used for data processing and analysis in industry and academia. These big data processing frameworks adopt the distributed architecture, generally developed in object-oriented languages like Java and Scala. These frameworks take Java virtual machine (JVM) as the runtime environment on cluster nodes to execute computing tasks, i.e., relying on JVM’s automatic memory management mechanism to allocate and reclaim data objects. However, current JVMs are not designed for the big data processing frameworks, leading to many problems such as long garbage collection (GC) time and high cost of data serialization and deserialization. As reported by users and researchers, GC time can take even more than 50% of the overall application execution time in some cases. Therefore, JVM memory management problem has become the performance bottleneck of the big data processing frameworks. This study systematically reviews the recent JVM optimization research work for big data processing frameworks. The contributions include the following three outcomes. First, the root causes of the performance degradation of big data applications when executed in JVM are summarized. Second, the existing JVM optimization techniques are summarized for big data processing frameworks. These methods are also classified into categories, the advantages and disadvantages of each are compared and analyzed, including the method’s optimization effects, application scopes, and burdens on users. Finally, some future JVM optimization directions are proposed, which will help the performance improvement of big data processing frameworks.

    Reference
    Related
    Cited by
Get Citation

汪钇丞,曾鸿斌,许利杰,王伟,魏峻,黄涛.面向大数据处理框架的JVM优化技术综述.软件学报,2023,34(1):463-488

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:January 18,2021
  • Revised:April 29,2021
  • Adopted:
  • Online: November 24,2021
  • Published:
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063