Survey on Task Scheduling of Deep Learning Training Based on Performance Modeling

doi:10.13328/j.cnki.jos.007202

微信服务号

微信订阅号

2025-4-15- 1

Home > Archive>Volume 36, Issue 4, 2025 >1570-1589. DOI:10.13328/j.cnki.jos.007202

PDF HTML XML Export Cite reminder

Survey on Task Scheduling of Deep Learning Training Based on Performance Modeling
DOI:
                        10.13328/j.cnki.jos.007202
                    
Author:
                        YANG Zi-ChaoYANG Zi-Chao
University of Chinese Academy of Sciences, Beijing 100049, China;Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
WU HengWU Heng
Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China;University of Chinese Academy of Sciences, Nanjing, Nanjing 211135, China;Nanjing Institute of Software Technology, Nanjing 211135, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
WU Yue-WenWU Yue-Wen
Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
ZHANG Wen-BoZHANG Wen-Bo
Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China;University of Chinese Academy of Sciences, Nanjing, Nanjing 211135, China;Nanjing Institute of Software Technology, Nanjing 211135, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:TP18
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

In recent years, research achievements in deep learning have found widespread applications globally. To enhance the training efficiency of large-scale deep learning models, industry practices often involve constructing GPU clusters and configuring efficient task schedulers. However, deep learning training tasks exhibit complex performance characteristics such as performance heterogeneity and placement topological sensitivity. Scheduling without considering performance can lead to issues such as low resource utilization and poor training efficiency. In response to this challenge, a great number of schedulers of deep learning training tasks based on performance modeling have emerged. These schedulers, by constructing accurate performance models, delve into the intricate performance characteristics of tasks. Based on this understanding, they design more optimized scheduling algorithms, thereby forming more efficient scheduling solutions. This study begins with a modeling design perspective, providing a categorized review of the performance modeling methods employed by current schedulers. Subsequently, based on the optimized scheduling approaches from performance modeling by schedulers, a systematic analysis of existing task scheduling efforts is presented. Finally, this study outlines prospective research directions for performance modeling and scheduling in the future.

Key words:deep learning training;performance modeling;task scheduling

Get Citation

杨紫超,吴恒,吴悦文,张文博.基于性能建模的深度学习训练任务调度综述.软件学报,2025,36(4):1570-1589

Copy

Article Metrics

Abstract:371
PDF: 2279
HTML: 73
Cited by: 0

History

Received:September 25,2023
Revised:November 06,2023
Adopted:
Online: June 20,2024
Published:

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History