任务并行编程模型研究与进展

doi:10.3724/SP.J.1001.2013.04339

微信服务号

微信订阅号

2025年5月11日 5:04 星期日

首页 > 过刊浏览>2013年第24卷第1期 >77-90. DOI:10.3724/SP.J.1001.2013.04339

PDF HTML阅读 XML下载导出引用引用提醒

任务并行编程模型研究与进展
DOI:
                        10.3724/SP.J.1001.2013.04339
                    
CSTR:
                        
                    
作者:
                        王蕾王蕾
计算机体系结构国家重点实验室(中国科学院 计算技术研究所),北京 100190
在期刊界中查找
在百度中查找
在本站中查找
崔慧敏崔慧敏
计算机体系结构国家重点实验室(中国科学院 计算技术研究所),北京 100190
在期刊界中查找
在百度中查找
在本站中查找
陈莉陈莉
计算机体系结构国家重点实验室(中国科学院 计算技术研究所),北京 100190
在期刊界中查找
在百度中查找
在本站中查找
冯晓兵冯晓兵
计算机体系结构国家重点实验室(中国科学院 计算技术研究所),北京 100190
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:国家自然科学基金(61202055, 60970024, 60925009, 60921002); 国家高技术研究发展计划(863)(2012AA010902);国家重点基础研究发展计划(973)(2011CB302504); “核高基”国家重大科技专项基金(2009ZX01036-001-002, 2011ZX01028-001-002)

Research on Task Parallel Programming Model

Author:

WANG Lei
WANG Lei
State Key Laboratory of Computer Architecture (Institute of Computing Technology, The Chinese Academy of Sciences), Beijing100190, China
在期刊界中查找
在百度中查找
在本站中查找
CUI Hui-Min
CUI Hui-Min
State Key Laboratory of Computer Architecture (Institute of Computing Technology, The Chinese Academy of Sciences), Beijing100190, China
在期刊界中查找
在百度中查找
在本站中查找
CHEN Li
CHEN Li
State Key Laboratory of Computer Architecture (Institute of Computing Technology, The Chinese Academy of Sciences), Beijing100190, China
在期刊界中查找
在百度中查找
在本站中查找
FENG Xiao-Bing
FENG Xiao-Bing
State Key Laboratory of Computer Architecture (Institute of Computing Technology, The Chinese Academy of Sciences), Beijing100190, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [58]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

任务并行编程模型是近年来多核平台上广泛研究和使用的并行编程模型,旨在简化并行编程和提高多核利用率.首先,介绍了任务并行编程模型的基本编程接口和支持机制;然后,从3个角度,即并行性表达、数据管理和任务调度介绍任务并行编程模型的研究问题、困难和最新研究成果;最后展望了任务并行未来的研究方向.

关键词:任务并行;并行编程模型;任务窃取调度;并行性表达

Abstract:

Task parallel programming model is a widely used parallel programming model on multi-core platforms. With the intention of simplifying parallel programming and improving the utilization of multiple cores, this paper provides an introduction to the essential programming interfaces and the supporting mechanism used in task parallel programming models and discusses issues and the latest achievements from three perspectives: Parallelism expression, data management and task scheduling. In the end, some future trends in this area are discussed.

Key words:task parallelism;parallel programming model;work stealing scheduling;parallelism expression

参考文献

[1] Jack D. The promise and perils of the coming multicore revolution and its impact. CTWatch Quarterly, 2007,3(1):1-33.

[2] Asanovic K, Bodik R, Catanzaro BC, Gebis JJ, Husbands P, Keutzer K, Patterson DA, Plishker WL, Shalf J, Williams SW, Yelick KA. The landscape of parallel computing research: A view from berkley. Technical Report, UCB/EECS-2006-183, Berkeley: University of California, 2006.

[3] Sutter H. The free lunch is over: A fundamental turn toward concurrency in software. Dr. Dobb''s Journal, 2005,30(3):202-210.

[4] An H, Chen GL. Parallel programming models and languages. Ruanjian Xuebao/Journal of Software, 2002,13(1):118-124 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/20020117.htm

[5] Blumofe RD, Joerg CF, Kuszmaul BC, Leiserson CE, Randall KH, Zhou YL. Cilk: An efficient multithreaded runtime system. In: Proc. of the 5th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming (PPoPP). 1995. 207-216. [doi: 10. 1145/209936.209958]

[6] Frigo M, Leiserson CE, Randall KH. The implementation of the Cilk-5 multithreaded language. In: Proc. of the ACM SIGPLAN''98 Conf. on Programming Language Design and Implementation (PLDI). 1998. 212-223. [doi: 10.1145/277650. 277725]

[7] Intel Inc. Intel? Cilk++ SDK programmer''s guide. 2009. http://software.intel.com/en-us/articles/download-intel-cilk-sdk/

[8] Intel. Intel(R) threading building blocks 3.0 reference manul. 2010. http://www.threadingbuildingblocks.org

[9] Leijen D, Schulte W. The design of a task parallel library. In: Proc. of the 24th ACM SIGPLAN Conf. on Object Oriented Programming Systems Languages and Applications (OOPSLA). 2009. 227-242. [doi: 10.1145/1640089.1640106]

[10] Lea D. A Java fork/join framework. In: Proc. of the ACM 2000 Conf. on Java Grande. 2000. 36-43. [doi: 10.1145/337449.337465]

[11] Charles P, Grothoff C, Saraswat VA, Donawa C, Kielstra A, Ebcioglu K, von Praun C, Sarkar V. X10: An object-oriented approach to non-uniform cluster computing. In: Proc. of the 20th Annual ACM SIGPLAN Conf. on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA). 2005. 519-538. [doi: 10.1145/1094811.1094852]

[12] Cavé V, Zhao JS, Shirako J, Sarkar V. Habanero-Java: The new adventures of old X10. In: Proc. of the 9th Int''l Conf. on the Principles and Practice of Programming in Java (PPPJ). 2011. [doi: 10.1145/2093157.2093165]

[13] OpenMP Application Program Interface. Version 3.0. 2008.

[14] Cavé V, Budimlic Z, Sarkar V. Comparing the usability of library vs. language approaches to task parallelism. In: Proc. of the Workshop on Evaluation and Usability of Programming Languages and Tools (PLATEAU). 2010. [doi: 10.1145/1937117.1937126]

[15] Blumofe RD, Leiserson CE. Scheduling multithreaded computations by work stealing. Journal of the ACM, 1999,46(5):720-748. [doi: 10.1145/324133.324234]

[16] Blelloch RD, Leiserson CE. Space-Efficient scheduling of multithreaded computations. In: Proc. of the 25th Annual ACM Symp. on Theory of Computing. 1993. 362-371. [doi: 10.1145/167088.167196]

[17] Guo Y, Barik R, Raman R, Sarkar V. Work-First and help-first scheduling policies for async-finish task parallelism. In: Proc. of the 2009 IEEE Int''l Symp. on Parallel and Distributed Processing (IPDPS). 2009. 1-12. [doi: 10.1109/IPDPS.2009.5161079]

[18] Burton FW, Sleep MR. Executing functional programs on a virtual tree of processors. In: Proc. of the Conf. on Functional Programming Languages and Computer Architecture. 1981. 187-194. [doi: 10.1145/800223.806778]

[19] Addison C, LaGrone J, Huang L, Chapman B. OpenMP 3.0 tasking implementation in OpenUH. In: Proc. of the Open64 Workshop in Conjunction with the Int''l Symp. on Code Generation and Optimization. 2009. http://www.capsl.udel.edu/conferences/open64/ 2009/

[20] Tardieu O, Wang HC, Lin HB. A work-stealing scheduler for X10''s task parallelism with suspension. In: Proc. of the 17th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming (PPoPP). 2012. 267-276. [doi: 10.1145/2370036.2145850]

[21] Shirako J, Sarkar V. Hierarchical phasers for scalable synchronization and reduction in dynamic parallelism. In: Proc. of the 24th IEEE Int''l Parallel and Distributed Processing Symp. (IPDPS). 2010. 1-12. [doi: 10.1109/IPDPS.2010.5470414]

[22] Frigo M, Halpern P, Leiserson CE, Lewin-Berlin S. Reducers and other Cilk++ hyperobjects. In: Bender MA, ed. Proc. of the 21st ACM Symp. on Parallelism in Algorithms and Architectures (SPAA). New York: ACM Press, 2009. 79-90.

[23] Leiserson CE, Schardl TB. A work-efficient parallel breadth-first search algorithm (or how to cope with the nondeterminism of reducers). In: Proc. of the 22nd ACM Symp. on Parallelism in Algorithms and Architectures (SPAA). 2010. 303-314. [doi: 10. 1145/1810479.1810534]

[24] Agrawal K, Leiserson CE, Sukha J. Helper locks for fork-join parallel programming. In: Proc. of the 15th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming (PPoPP). 2010. 245-256. [doi: 10.1145/1837853.1693487]

[25] Jr Halstead RH. Implementation of multilisp: Lisp on a multiprocessor. In: Proc. of the Symp. on Lisp and Functional Programming. 1984. 9-17. [doi: 10.1145/800055.802017]

[26] Mohr E, Kranz DA, Jr Halstead RH. Lazy task creation: A technique for increasing the granularity of parallel programs. In: Proc. of the ''90 ACM Conf. on LISP and Functional Programming (LFP''90). New York: ACM Press, 1990. 185-197. [doi: 10.1145/91556. 91631]

[27] Squillante MS, Nelson RD. Analysis of task migration in shared-memory multiprocessor scheduling. In: Proc. of the ''91 ACM SIGMETRICS Conf. on Measurement and Modeling of Computer Systems. 1991. 143-155. [doi: 10.1145/107971.107987]

[28] Fatourou P, Spriakis P. Efficient scheduling of strict multithreaded computations. Theory of Computing Systems Journal, 2000, 33(3):173-332. [doi: 10.1007/s002240010002]

[29] Berenbrink P, Friedetzky T, Goldberg LA. The natural work stealing algorithm is stable. In: Proc. of the IEEE Symp. on Foundations of Computer Science. 2001. 178-187. [doi: 10.1137/S0097539701399551]

[30] Squillante MS, Lazowska ED. Using processor-cache affinity information in shared-memory multiprocessor scheduling. IEEE Trans. on Parallel and Distributed Systems, 1993,4(2):131-143. [doi: 10.1109/71.207589]

[31] Acar UA, Blelloch GE, Blumofe RD. The data locality of work stealing. In: Proc. of the 12th Annual ACM Symp. on Parallel Algorithms and Architectures (SPAA). 2000. 1-12. [doi: 10.1145/341800.341801]

[32] Hamidzadeh B, Lilja DJ. Dynamic scheduling strategies for shared-memory multiprocessors. In: Proc. of the 16th Int''l Conf. on Distributed Computing. 1996. 208-215. [doi: http://dx.chinadoi.cn/10.3760/cma.j.issn.1674-2907.2010.17.004]

[33] Lo M, Dandamudi SP. Performance of hierarchical load sharing in heterogeneous distributed systems. In: Yetongnon K, ed. Proc. of the Int''l Conf. on Parallel and Distributed Computing Systems. Dijon: ISCA, 1996. 370-377.

[34] Bender MA, Rabin MO. Online scheduling of parallel programs on heterogeneous systems with applications to Cilk. In: Proc. of the Theory of Computing Systems Special Issue on SPAA. 2002. [doi: 10.1007/s00224-002-1055-5]

[35] Sanchez D, Yoo RM, Kozyrakis C. Flexible architectural support for fine-grain scheduling. In: Proc. of the 15th Edition of ASPLOS on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 2010. 311-322. [doi: 10.1145/ 1736020.1736055]

[36] Mohr E, Kranz DA, Jr HalsteadRH, Lazy task creation: A technique for increasing the granularity of parallel programs. IEEE Trans. on Parallel and Distributed Systems, 1991,2(3):264-280. [doi: 10.1109/71.86103]

[37] Loidl HW, Hammond K. On the granularity of divide-and-conquer parallelism. In: Turner DN, ed. Proc. of the Glasgow Workshop on Functional Programming. Ullapool: Springer-Verlag, 1995. 8-10.

[38] Duran A, Corbalán J, Ayguadé E. Evaluation of OpenMP task scheduling strategies. In: Eigenmann R, Supinski BRD, eds. Proc. of the 4th Int''l Workshop on OpenMP (IWOMP). Berlin: Springer-Verlag, 2008. 100-110.

[39] Guojing C, Sreedhar K, Sriram K, Doug L, Vijay S, Tong W. Solving large, irregular graph problems using adaptive work-stealing. In: Proc. of the 37th Int''l Conf. on Parallel Processing. 2008. 536-545. [doi: 10.1109/ICPP.2008.88]

[40] Chen SM, Gibbons PB, Kozuch M, Liaskovitis V, Ailamaki A, Blelloch GE, Falsofi B, Fix L, Hardavellas N, Mowry TC, Wilkerson C. Scheduling threads for constructive cache sharing on CMPs. In: Proc. of the 19th Annual ACM Symp. on Parallel Algorithms and Architectures (SPAA). 2007. 105-115. [doi: 10.1145/1248377.1248396]

[41] Duran A, Corbalán J, Ayguadé E. An adaptive cut-off for task parallelism. In: Proc. of the 2008 ACM/IEEE Conf. on Supercomputing. 2008. [doi: 10.1109/SC.2008.5213927]

[42] Acar UA, Charguéraud A, Rainey M. Oracle scheduling: Controlling granularity in implicitly parallel languages. In: Proc. of the 2011 ACM Int''l Conf. on Object Oriented Programming Systems Languages and Applications (OOPSLA). 2011. 499-518. [doi: 10. 1145/2076021.2048106]

[43] Wang L, Cui HM, Duan YL, Lu F, Feng XB, Yew PC. An adaptive task creation strategy for work-stealing scheduling. In: Proc. of the 8th Annual IEEE/ACM Int''l Symp. on Code Generation and Optimization. 2010. [doi: 10.1145/1772954.1772992]

[44] Tzannes A, Caragea GC, Barua R. Lazy binary-splitting: A run-time adaptive work-stealing scheduler. In: Proc. of the 15th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming (PPoPP). 2010. 179-190. [doi: 10.1145/1693453.1693479]

[45] Lee ITA, Boyd-Wickizer S, Huang ZY, Leiserson CE. Using memory mapping to support cactus stacks in work-stealing runtime systems. In: Proc. of the 19th Int''l Conf. on Parallel Architectures and Compilation Techniques (PACT). 2010. 411-420. [doi: 10. 1145/1854273.1854324]

[46] Zhao JS, Shirako J, Nandivada VK, Sarkar V. Reducing task creation and termination overhead in explicitly parallel programs. In: Proc. of the 19th Int''l Conf. on Parallel Architectures and Compilation Techniques (PACT). 2010. 169-180. [doi: 10.1145/1854273. 1854298]

[47] Robison A, Voss M, Kukanov A. Optimization via reflection on work stealing in TBB. In: Proc. of the IEEE Int''l Symp. on Parallel and Distributed Processing (IPDPS). 2008. 1-8. [doi: 10.1109/IPDPS.2008.4536188]

[48] Chen Q, Huang ZY, Guo MY, Zhou JY. CAB: Cache aware bi-tier task-stealing in multi-socket multi-core architecture. In: Proc. of the Int''l Conf. on Parallel Processing (ICPP). 2011. 722-732. [doi: 10.1109/ICPP.2011.32]

[49] Olivier SL, Porterfield AK, Wheeler KB, Prins JF. Scheduling task parallelism on multi-socket multicore systems. In: Proc. of the Int''l Workshop on Runtime and Operating Systems for Supercomputers (ROSS). 2011. [doi: 10.1145/1988796.1988804]

[50] Guo Y, Zhao JS, Cavé V, Sarkar V. SLAW: A scalable locality-aware adaptive work-stealing scheduler. In: Proc. of the 24th IEEE Int''l Parallel and Distributed Processing Symp. (IPDPS). 2010. 1-12. [doi: 10.1145/1693453.1693504]

[51] Dinan J, Larkins DB, Sadayappan P, Krishnamoorthy S, Nieplocha J. Scalable work stealing. In: Proc. of the ACM/IEEE Conf. on High Performance Computing. 2009. [doi: 10.1145/1654059.1654113]

[52] Saraswat VA, Kambadur P, Kodali S, Grove D, Krishnamoorthy S. Lifeline-Based global load balancing. In: Proc. of the 16th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming (PPoPP 2011). 2011. 201-212. [doi: 10.1145/1941553. 1941582]

[53] Cao YJ, Qian DP, Wu WG, Dong XS. Adaptive scheduling algorithm based on dynamic core-resource partitions for manycore processor systems. Ruanjian Xuebao/Journal of Software, 2012,23(2):240-252 (in Chinese with English abstract). http://www.jos.org.cn/ 1000-9825/4141.htm [doi: 10.3724/SP.J.1001.2012.04141]

[54] Long GP, Zhang JC, Fan DR. Architectural support and evaluation of Cilk language on many-core architectures. Chinese Journal of Computers, 2008,31(11):1975-1985 (in Chinese with English abstract). [doi: http://dx.chinadoi.cn/10.3321/j.issn: 0254-4164.2008.11.012]

[55] Li Z, Certner O, Duato J, Temam O. Scalable hardware support for conditional parallelization. In: Proc. of the 19th Int''l Conf. on Parallel Architectures and Compilation Techniques (PACT). 2010. [doi: 10.1145/1854273.1854297]

[56] Michael MM, Vechev MT, Saraswat VA. Idempotent work stealing. In: Proc. of the 14th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming (PPoPP). 2009. 45-54. [doi: 10.1145/1594835.1504186]

[57] Chase D, Lev Y. Dynamic circular work-stealing d-e-que. In: Proc. of the 17th Annual ACM Symp. on Parallelism in Algorithms and Architectures (SPAA). 2005. 21-28. [doi: 10.1145/1073970.1073974]

[58] Chen L, Huo W, Long XJ, Tang SL. Parallel programming languages on multi-core and many-core architectures. Information Technology Letter, 2012,10(1):23-40 (in Chinese with English abstract).

引用本文

王蕾,崔慧敏,陈莉,冯晓兵.任务并行编程模型研究与进展.软件学报,2013,24(1):77-90

复制

文章指标

点击次数:11387
下载次数: 30206
HTML阅读次数: 0
引用次数: 0

历史

收稿日期:2012-07-23
最后修改日期:2012-10-19
录用日期:
在线发布日期: 2012-12-29
出版日期:

微信服务号

微信订阅号

引用本文

相关视频

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

相关视频

分享

微信扫一扫：分享

文章指标

历史

文章二维码