Task Parallel Framework and Its Application in Nested Parallel Algorithms on the SW26010 Many-core Platform

doi:10.13328/j.cnki.jos.006007

微信服务号

微信订阅号

2025-5-2- 9

Home > Archive>Volume 32, Issue 8, 2021 >2352-2364. DOI:10.13328/j.cnki.jos.006007

PDF HTML XML Export Cite reminder

Task Parallel Framework and Its Application in Nested Parallel Algorithms on the SW26010 Many-core Platform
DOI:
                        10.13328/j.cnki.jos.006007
                    
Author:
                        SUN QiaoSUN Qiao
Laboratory of Parallel Software and Computational Science, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
LI Lei-ShengLI Lei-Sheng
Laboratory of Parallel Software and Computational Science, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
ZHAO Hai-TaoZHAO Hai-Tao
Laboratory of Parallel Software and Computational Science, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
ZHAO HuiZHAO Hui
Laboratory of Parallel Software and Computational Science, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
WU Chang-MaoWU Chang-Mao
Laboratory of Parallel Software and Computational Science, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:TP303
Fund Project:Strategic Priority Research Program of the Chinese Academy of Sciences (Category C) (XDC01030200)

Article

Figures

Metrics

Reference [32]

Related [20]

Cited by

Materials

Comments

Abstract:

Task parallelism is one of the fundamental patterns for designing parallel algorithms. Due to algorithm complexity and distinctive hardware features, however, implementation of algorithms in task parallelism often remains to be challenging. On the newly SW26010 many-core CPU platform, a general runtime framework, SWAN, which supports nested task parallelism is proposed in this study. SWAN provides high-level abstractions for programmers to implement task parallelism so that they can focus mainly on the algorithm itself, enjoying an enhanced productivity. In the aspect of performance, the shared resources and information manipulated by SWAN are partitioned in a fine-grained manner to avoid fierce contention among working threads. The core data structures within SWAN take advantage of the high-bandwidth memory access mechanism, fast on-chip scratchpad cache as well as atomic operations of the platform to reduce the overhead of SWAN itself. Besides, SWAN provides dynamic load-balancing strategies in runtime to ensure a full occupation of the threads. In the experiment, a set of recursive algorithms in nested parallelism, including the N-queens problem, binary-tree traversal, quick sort, and convex hull, are implemented using SWAN on the target platform. The experimental results reveal that each of the algorithms can gain a significant speedup, from 4.5x to 32x, against its serial counterpart, which suggests that SWAN has a high usability and performance.

Key words:task parallel framework;parallel computing;nested parallel algorithm;SWAN;SW26010 many-core CPU

Reference

[1] Wang L, Cui HM, Chen L, Feng XB. Research on task parallel programming model. Ruan Jian Xue Bao/Journal of Software, 2013, 24(1):77-90(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/4339.htm[doi:10.3724/SP.J.1001.2013.04339]

[2] An H, Chen GL. Parallel programming models and languages. Ruan Jian Xue Bao/Journal of Software, 2002,13(1):118-124(in Chinese with English abstract). http://www.jos.org.cn/jos/ch/reader/create_pdf.aspx?file_no=20020117&year_id=2002&quarter_id=1&falg=1

[3] Ian F, Krishnaiyer R, Choudhary A. A library-based approach to task parallelism in a data-parallel language. Journal of Parallel & Distributed Computing, 1997,45(2):148-158.

[4] Blikberg R, Srevik T. Nested parallelism:Allocation of threads to tasks and OpenMP. Scientific Programming, 2001,9(2):185-194.

[5] Duran A, Teruel X, Ferrer R, Martorell X, AyguadE. Barcelona OpenMP tasks suite:A set of benchmarks targeting the exploitation of task parallelism in OpenMP. In:Proc. of the 2009 Int'l Conf. on Parallel Processing. Vienna:IEEE, 2009. 124-131.

[6] Wang QX, Sun SX, Shang MS, Liu YB. Research of parallel computing model. Computer Science, 2004,31(9):130-133.

[7] Blumofe RD, Joerg CF, Kuszmaul BC, Leiserson CE, Randall KH, ZhouY. Cilk:An efficient multithreaded runtime system. Journal of Parallel & Distributed Computing, 1996,37(1):55-69.

[8] Sun Q, Zhang CY, Wu CM, Zhang JJ, Li LS. Bandwidth reduced parallel SpMV on the SW26010 many-core platform. In:Proc. of the 47th Int'l Conf. on Parallel Processing. 2018. Article No.54.

[9] Ayguad E, Copty N, Duran A, Hoeflinger J, Lin Y, Massaioli F, Su E, Unnikrishnan P, Zhang G. A proposal for task parallelism in OpenMP. In:Proc. of the Int'l Workshop on OpenMP. Berlin:Springer-Verlag, 2010. 1-12.

[10] Mller MS, Baron J, Brantley WC, Feng H, Hackenberg D, Henschel R, Jost G, Molka D, Parrott C, Robichaux J. SPEC OMP2012—An application benchmark suite for parallel systems using OpenMP. In:Proc. of the Int'l Conf. on OpenMP in a Heterogeneous World. Berlin:Springer-Verlag, 2012. 223-236.

[11] Duran A, Corbaln J, Ayguad E. Evaluation of OpenMP task scheduling strategies. In:Proc. of the OpenMP in a New Era of Parallelism Int'l Workshop. West Lafayette:Springer-Verlag, 2008. 100-110.

[12] OpenMP 4.0 Specification. 2012. http://www.openmp.org/uncategorized/openmp-40/

[13] Blumofe RD, Leiserson CE. Scheduling multithreaded computations by work stealing. Parallel & Distributed Computing, 1997, 45(2):148-158.

[14] Leiserson CE. Programming irregular parallel applications in cilk. In:Proc. of the Int'l Symp. on Solving Irregularly Structured Problems in Parallel. Berlin:Springer-Verlag, 1997. 61-71.

[15] Frigo M, Leiserson CE, Randall KH. The implementation of the Cilk-5 multithreaded language. ACM Sigplan Notices, 1999,33(5):212-223.

[16] Cav V, Zhao J, Shirako J, Sarkar V. Habanero-Java:The new adventures of Old X10. In:Proc. of the 9th Int'l Conf. on Principles and Practice of Programming in Java. New York:ACM, 2011. 51-61.

[17] Berenbrink P, Friedetzky T, Goldberg LA. The natural work-stealing algorithm is stable. SIAM Journal of Computing, 2003,32(5):1260-1279.

[18] Tardieu O, Wang H, Lin H. A work-stealing scheduler for X10's task parallelism with suspension. ACM Sigplan Notices, 2012, 47(8):267-276.

[19] Task Parallel Library (TPL). 2017. https://docs.microsoft.com/en-us/dotnet/standard/parallel-programming/task-parallel-library-tpl

[20] Addison C, LaGrone J, Huang L, Chapman B. Openmp 3.0 tasking implementation in OpenUH. In:Proc. of the Open64 Workshop. 2009. 1-10.

[21] Robison A, Voss M, Kukanov A. Optimization via reflection on work stealing in TBB. In:Proc. of the IEEE Int'l Symp. on Parallel and Distributed Processing. Miami:IEEE, 2008. 1-8.

[22] Chatterjee S, Grossman M, Sbrlea A, Sarkar V. Dynamic task parallelism with a GPU work-stealing runtime system. In:Proc. of the 24th Int'l Workshop on Languages and Compilers for Parallel Computing. Berlin:Springer-Verlag, 2013. 8-10.

[23] Cédric A, Samuel T, Raymond N, Pierre-André W. StarPU:A unified platform for task scheduling on heterogeneous multicore architectures. In Proc. of the Euro-Par 15th Int'l Conf. on Parallel Processing. LNCS 5704, Delft, 2009. 863-874.

[24] Tzeng S, Patney A, Owens JD. Task management for irregular parallel workloads on the GPU. In:Proc. of the ACM SIGGRAPH/EUROGRAPHICS Conf. on High Performance Graphics. Saarbrücken:ACM, 2010. 29-37.

[25] Thread Building Blocks. 2021. https://www.threadingbuildingblocks.org/

[26] Copty N, Duran A, Hoeflinger J, Massaioli F, Massaioli F, Teruel X, Unnikrishnan P, Zhang G. The design of OpenMP tasks. IEEE Trans. on Parallel & Distributed Systems, 2009,20(3):404-418.

[27] Fu HH, Liao JF, Yang JZ, et al. The Sunway TaihuLight supercomputer:System and applications. SCIENCE CHINA:Information Sciences, 2016,59(7):1-16.

[28] Acar UA, Blelloch GE, Blumofe RD. The data locality of work stealing. Theory of Computing Systems, 2002,35(3):321-347.

[29] Hamidzadeh B, Lilja DJ. Dynamic scheduling strategies for shared-memory multiprocessors. In:Proc. of the Int'l Conf. on Distributed Computing Systems. Hong Kong:IEEE, 1996. 208-215.

附中文参考文献:

[1] 王蕾,崔慧敏,陈莉,冯晓兵.任务并行编程模型研究与进展.软件学报,2013,24(1):77-90. http://www.jos.org.cn/1000-9825/4339.htm[doi:10.3724/SP.J.1001.2013.04339]

[2] 安虹,陈国良.并行程序设计模型和语言.软件学报,2002,13(1):118-124. http://www.jos.org.cn/jos/ch/reader/create_pdf.aspx?file_no=20020117&year_id=2002&quarter_id=1&falg=1

Get Citation

孙乔,黎雷生,赵海涛,赵慧,吴长茂. SW26010众核任务并行调度系统及其嵌套并行算法应用.软件学报,2021,32(8):2352-2364

Copy

Article Metrics

Abstract:1457
PDF: 4777
HTML: 3416
Cited by: 0

History

Received:August 22,2019
Revised:December 05,2019
Adopted:
Online: August 05,2021
Published: August 06,2021

You are the first2041634Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History