GPU上两阶段负载调度问题的建模与近似算法
作者:
基金项目:

国家自然科学基金(61300194);国家教育部博士点基金(20110042110021);国家科技支撑计划(2012BAK24B01);河北省自然科学基金(F2013501048)


Two-Stage Workload Scheduling Problem on GPU Architectures: Formulation and Approximation Algorithm
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [39]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    随着硬件功能的不断丰富和软件开发环境的逐渐成熟,GPU(graphics processing unit)越来越多地被应用到通用计算领域,并对诸多计算系统(尤其是嵌入式系统)性能的显著提升起到了至关重要的作用.在基于GPU的计算系统中,大规模并行负载同时进行数据传输和加载的情况时常发生,数据传输延时在系统性能全局最优化中变得不容忽视.综合考虑负载的传输时间和执行时间,以总负载makespan最小化作为系统性能的全局优化目标,研究了GPU上负载“传输-执行”联合调度问题.首先,将负载的时间信息和并行任务数与矩形域的二维空间联系起来,建立了负载的2D双层矩形域模型;然后,将GPU上负载调度问题归结为一类Strip-Packing问题;最后,基于贪婪策略给出了近似度为3的多项式时间近似算法,算法复杂度为O(nlogn).该近似算法的核心是对数据传输阶段进行负载排序调度.这从理论层面上证明了GPU系统采取“传输-执行”两阶段调度的有效性,即,在数据传输阶段采取负载排序调度,在负载执行阶段采取先来先服务(first-come-first-serve,简称FCFS)调度,能够使GPU 性能达到全局最优或近似最优.

    Abstract:

    With the prevalence of general purpose computation, GPUs (graphics processing units) are becoming extremely important to significantly improve system performances for many computing systems, including embedded systems. Running massively parallel kernels on GPUs is challenging for system’s overall performance especially when large amount of workloads (kernels) are running together. This paper investigates how to schedule large amount of workloads that have to be executed on GPUs to minimize the makespan of all workloads to improve the system overall performance. By considering the transfer time and execution time together, the study makes an abstraction for each workload and formulate the scheduling problem on GPUs into a 2D rectangular strip-packing model. A polynomial 3-approxiamation algorithm is proposed to solve the strip-packing problem. The approximation results exhibit an effective approach for workload sequencing during the data offloading on GPUs. It also implies that the scheduling jointed by workload sequencing for GPUs data offloading and first-come-first-serve (FCFS) scheduling inside GPUs with workload conserving can improve the system performance optimally or near-optimally.

    参考文献
    [1] Lin YS, Yang XJ, Tang T, Wang GB, Xu XH. A GPU low-power optimization based on parallelism analysis model. Chinese Journal of Computers, 2011,34(4):706-716 (in Chinese with English abstract). [doi: 10.3724/SP.J.1016.2011.00705]
    [2] Lin YH, Kong FX, Xu HT, Jin X, Deng QX. Minimizing engergy consumption for linear speedup parallel real-time tasks. Chinese Journal of Computers, 2013,26(2):384-392 (in Chinese with English abstract). [doi: 10.3724/SP.J.1016.2013.00384]
    [3] Tan GZ, Sun JH, Wang BC, Yao WH. Solving Chinese postman problem on time varying network with timed automata. Ruan Jian Xue Bao/Journal of Software, 2011,22(6):1267-1280 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/4033. htm [doi: 10.3724/SP.J.1001.2011.04033]
    [4] Gregg C, Dorn J, Hazelwood K, Skadron K. Finegrained resource sharing for concurrent GPGPU kernels. In: Proc. of the 4th USENIX Conf. on Hot Topics in Parallelism. USENIX Association Berkeley, 2012. 10-16.
    [5] Li T, Narayana VK, Araby EE, Ghazawi TE. Gpu resource sharing and virtualization on high performance computing systems. In: Proc. of the 2011 Int’l Conf. on IEEE Parallel Processing (ICPP). IEEE, 2011. 733-742. [doi: 10.1109/ICPP.2011.88]
    [6] Kato S, Lakshmanan K, Kumar A, Kelkar M, Ishikawa Y, Rajkumar R. RGEM: A responsive GPGPU execution model for runtime engines. In: Proc. of the IEEE 32nd Real-Time Systems Symp. (RTSS). IEEE, 2011. 57-66. [doi: 10.1109/RTSS.2011.13]
    [7] Kato S, Lakshmanan K, Rajkumar R, Ishikawa Y. Timegraph: GPU scheduling for real-time multi-tasking environments. In: Proc. of the 2011 USENIX Annual Technical Conf. (USENIX ATC11). USENIX Association Berkeley, 2011. 17-31.
    [8] Kato S, Throw M, Maltzahn C, Brandt S. Gdev: First-class gpu resource management in the operating system. In: Proc. of the 2012 USENIX Conf. on Annual Technical Conf. (USENIX ATC 2012). USENIX Association Berkeley, 2012. 37-49.
    [9] Liu W, Chen JJ, Kuo TW, Deng QX, Liu X. Optimize overall system performance through workload seqencing for gpus data offloading. In: Proc. of the 5th USENIX Workshop on Hot Topics in Parallelism (HotPar 2013). San Jose, USENIX Association Berkeley, 2013. http://hgpu.org/?p=9800
    [10] Hong S, Kim H. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. ACM SIGARCH Computer Architecture News, 2009,37(3):152-163.
    [11] Oğuz C, Ercan MF, Cheng TCE, Fung YF. Heuristic algorithms for multiprocessor task scheduling in a two-stage hybrid flow-shop. European Journal of Operational Research, 2003,149(2):390-403. [doi: 10.1145/1555815.1555775]
    [12] Schuurman P, Woeginger GJ. A polynomial time approximation scheme for the two-stage multiprocessor flow shop problem. Theoretical Computer Science, 2000,237(1):105-122. [doi: 10.1016/S0377-2217(02)00766-X]
    [13] Hoogeveen JA, Lenstra JK, Veltman B. Preemptive scheduling in a two-stage multiprocessor flow shop is NP-hard. European Journal of Operational Research, 1996,89(1):172-175.
    [14] Garey MR, Graham RL. Bounds for multiprocessor scheduling with resource constraints. SIAM Journal on Computing, 1975,4(2): 187-200. [doi: 10.4018/jgc.2012070106]
    [15] Ludwig W, Tiwari P. Scheduling malleable and nonmalleable parallel tasks. In: Proc. of the 5th Annual ACM-SIAM Symp. on Discrete Algorithms. Philadelphia: Society for Industrial and Applied Mathematics, 1994. 167-176. [doi: 10.1137/0204015]
    [16] Jansen K, Porkolab L. Linear-Time approximation schemes for scheduling malleable parallel tasks. In: Proc. of the 10th Annual ACM-SIAM Symp. on Discrete Algorithms. Philadelphia: Society for Industrial and Applied Mathematics, 1999. 490-498.
    [17] Amoura AK, Bampis E, Kenyon C, Manoussakis Y. Scheduling independent multiprocessor tasks. Algorithmica, 2002,32(2): 247-261.
    [18] Scharbrodt M, Steger A, Weisser H. Approximability of scheduling with fixed jobs. In: Proc. of the 10th Annual ACM-SIAM Symp. on Discrete Algorithms. Philadelphia: Society for Industrial and Applied Mathematics, 1999. 961-962. [doi: 10.1007/s00453-001- 0076-9]
    [19] Diedrich F, Jansen K. Improved approximation algorithms for scheduling with fixed jobs. In: Proc. of the 10th Annual ACM-SIAM Symp. on Discrete Algorithms. Philadelphia: Society for Industrial and Applied Mathematics, 2009. 675-684.
    [20] Jansen K, Prädel L, Schwarz UM, Svensson O. Faster approximation algorithms for scheduling with fixed jobs. In: Proc. of the Conf. of Computing: The Australasian Theory Symp. (CATS). Darlinghurst: Australian Computer Society, Inc., 2011. 3-9.
    [21] Bougeret M, Dutot PF, Jansen K, Robenek C, Trystram D. Scheduling jobs on heterogeneous platforms. In: Proc. of the Computing and Combinatorics. Berlin, Heidelberg: Springer-Verlag, 2011. 271-283.
    [22] Jansen K. A (3/2 ε) approximation algorithm for scheduling moldable and non-moldable parallel tasks. In: Proc. of the 24th ACM Symp. on Parallelism in Algorithms and Architectures. New York: ACM Press, 2012. 224-235. [doi: 10.1145/2312005.2312048]
    [23] Jansen K. Approximation algorithms for scheduling and packing problems. In: Proc. of the Approximation and Online Algorithms. Berlin, Heidelberg: Springer-Verlag, 2012. 1-8. [doi: 10.1007/978-3-642-29116-6_1]
    [24] Xie J, Wang X. Complexity and algorithms for two-stage flexible flowshop scheduling with availability constraints. Computers & Mathematics with Applications, 2005,50(10):1629-1638. [doi: 10.1016/j.camwa.2005.07.008]
    [25] Sahni SK. Algorithms for scheduling independent tasks. Journal of the ACM, 1976,23(1):116-127. [doi: 10.1145/321921.3 21934]
    [26] Hochbaum DS, Shmoys DB. Using dual approximation algorithms for scheduling problems theoretical and practical results. Journal of the ACM, 1987,34(1):144-162. [doi: 10.1145/321921.321934]
    [27] Hall LA. Approximability of flow shop scheduling. Mathematical Programming, 1998,82(1-2):175-190. [doi: 10.1145/7531.7535]
    [28] Williamson DP, Hall LA, Hoogeveen JA, Hurkens AJ, Lenstra JK, Sevast’janov SV, Shmoys DB. Short shop schedules. Operations Research, 1997,45(2):288-294.
    [29] Jansen K, Sviridenko MI. Polynomial time approximation schemes for the multiprocessor open and flow shop scheduling problem. In: Proc. of the STACS 2000. Berlin, Heidelberg: Springer-Verlag, 2000. 455-465. [doi: 10.1287/opre.45.2.288]
    [30] Gupta JND. Two-Stage, hybrid flowshop scheduling problem. Journal of the Operational Research Society, 1988,39(4):359-364.
    [31] Chen B. Analysis of classes of heuristics for scheduling a two-stage flow shop with parallel machines at one stage. Journal of the Operational Research Society, 1995,46(2):234-244. [doi: 10.1057/jors.1995.28]
    [32] Schuurman P, Woeginger GJ. A polynomial time approximation scheme for the two-stage multiprocessor flow shop problem. Theoretical Computer Science, 2000,237(1):105-122.
    [33] Sevastyanov SV. An improved approximation scheme for the Johnson problem with parallel machines. Journal of Applied and Industrial Mathematics, 2008,2(3):406-420.
    [34] Choi BC, Lee K. Two-Stage proportionate flexible flow shop to minimize the makespan. Journal of Combinatorial Optimization, 2013,25(1):123-134. [doi: 10.1134/S1990478908030113]
    [35] Ruiz R, Vázquez-Rodríguez JA. The hybrid flow shop scheduling problem. European Journal of Operational Research, 2010,205(1): 1-18. [doi: 10.1007/s10878-011-9423-1]
    [36] Emmons H, Vairaktarakis G. The Hybrid Flow Shop. In: Flow Shop Scheduling: Int’l Series in Operations Research & Management Science Vol. New York: Springer-Verlag, 2013. 161-187. [doi: 10.1016/j.ejor.2009.09.024]
    [37] Saravanan M, Sridhar S. An overview of hybrid flow shop scheduling: sustainability perspective. Int’l Journal of Green Computing, 2012,3(2):78-91. [doi: 10.4018/jgc.2012070106]
    [38] Oğuz C, Ercan MF, Cheng TCE, Fung YF. Heuristic algorithms for multiprocessor task scheduling in a two-stage hybrid flow-shop. European Journal of Operational Research, 2003,149(2):390-403. [doi: 10.1016/S0377-2217(02)00766-X]
    [39] Moseley B, Dasgupta A, Kumar R, Sarlós T. On scheduling in map-reduce and flow-shops. In: Proc. of the SPAA. New York: ACM, 2011. [doi: 10.1145/1989493.1989540]
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

孙景昊,邓庆绪,孟亚坤. GPU上两阶段负载调度问题的建模与近似算法.软件学报,2014,25(2):298-313

复制
分享
文章指标
  • 点击次数:5615
  • 下载次数: 7550
  • HTML阅读次数: 2566
  • 引用次数: 0
历史
  • 收稿日期:2013-05-06
  • 最后修改日期:2003-09-29
  • 在线发布日期: 2014-01-26
文章二维码
您是第19727565位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号