Abstract:With the prevalence of general purpose computation, GPUs (graphics processing units) are becoming extremely important to significantly improve system performances for many computing systems, including embedded systems. Running massively parallel kernels on GPUs is challenging for system’s overall performance especially when large amount of workloads (kernels) are running together. This paper investigates how to schedule large amount of workloads that have to be executed on GPUs to minimize the makespan of all workloads to improve the system overall performance. By considering the transfer time and execution time together, the study makes an abstraction for each workload and formulate the scheduling problem on GPUs into a 2D rectangular strip-packing model. A polynomial 3-approxiamation algorithm is proposed to solve the strip-packing problem. The approximation results exhibit an effective approach for workload sequencing during the data offloading on GPUs. It also implies that the scheduling jointed by workload sequencing for GPUs data offloading and first-come-first-serve (FCFS) scheduling inside GPUs with workload conserving can improve the system performance optimally or near-optimally.