High Throughput Optimization Technique for Apache Flink

doi:10.13328/j.cnki.jos.007235

微信服务号

微信订阅号

Home > Archive>Volume , Issue , >1-25. DOI:10.13328/j.cnki.jos.007235

PDF HTML XML Export Cite reminder

High Throughput Optimization Technique for Apache Flink
DOI:
                        10.13328/j.cnki.jos.007235
                    
Author:
                        
                        
                    
Affiliation:
Clc Number:TP311
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

With the advent of the big data era, massive volumes of user data have empowered numerous data-driven industry applications, such as smart grids, intelligent transportation, and product recommendations. In scenarios where real-time data is crucial, the business value embedded within data rapidly diminishes over time. Consequently, data analysis systems require high throughput and low latency. Stream processing systems in big data, exemplified by Apache Flink, have been widely applied. Flink enhances system throughput by parallelizing computing tasks across cluster nodes. However, current research indicates that Flink has weak single-point performance and poor cluster scalability. To improve the throughput of stream processing systems, researchers have focused on optimizations in designing control planes, implementing system operators, and improving vertical scalability. However, there is still a lack of attention to the data flow in streaming analysis applications. These applications are driven by event streams and employ stateful processing functions, including low voltage detection in smart grids and advertising recommendation. This study analyzes the data flow characteristics of typical streaming analysis applications, identifies three bottlenecks in optimizing scalability, and proposes corresponding optimization strategies: the key-level watermark strategy, the dynamic load distribution strategy, and the the key-value based exchange strategy. Based on these optimization strategies, this study implements Trilink based on Flink and applies it to various applications such as low voltage detection, bridge arch crowns monitoring, and the Yahoo Streaming Benchmark. Experimental results show that the modified system, Trilink, achieves more than a 5-fold increase in throughput in a single-machine environment and over a 1.6-fold improvement in horizontal scalability acceleration in an 8-node setup, compared to Flink.

Reference

Cited by

Get Citation

秦政,许利杰,陈伟,王毅,吴铭钞,曾鸿斌,王伟.面向Apache Flink流式分析应用的高吞吐优化技术.软件学报,,():1-25

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:February 03,2024
Revised:March 29,2024
Adopted:
Online: November 20,2024
Published:

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

Article Metrics

History