Abstract:Apache Flink, an emerging distributed computing framework, supports the execution of large-scale iterative programs on the cluster, but its default static resource allocation mechanism makes it impossible to carry out reasonable resource allocation to make iterative jobs complete on time. In response to this problem, that users should be relied on to actively express performance constraints rather than passively retain resources. RABORP, a dynamic resource allocation strategy based on runtime prediction is proposed to develop and implement a dynamic resource allocation plan for Flink iterative jobs with clear runtime limits. The main idea is to predict the runtime of each iteration superstep, and then the initial allocation and dynamic adjustment of resources are performed at the time of the iterative job submission and the synchronization barrier between the supersteps according to the predicted results, to ensure that the minimum set of resources can be used to complete the iterative job within the runtime limit specified by the user. A variety of typical Flink iterative jobs were executed under the dataset to carry out relevant comparative experiments. Experimental results show that the established runtime prediction model can accurately predict the runtime of each superstep, and compared with the current state-of-the-art algorithms, the proposed dynamic resource allocation strategy used in single-job and multi-job scenarios has improved various performance indicators.