任务并行编程模型研究与进展
作者:
基金项目:

国家自然科学基金(61202055, 60970024, 60925009, 60921002); 国家高技术研究发展计划(863)(2012AA010902);国家重点基础研究发展计划(973)(2011CB302504); “核高基”国家重大科技专项基金(2009ZX01036-001-002, 2011ZX01028-001-002)


Research on Task Parallel Programming Model
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [58]
  • |
  • 相似文献 [20]
  • |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    任务并行编程模型是近年来多核平台上广泛研究和使用的并行编程模型,旨在简化并行编程和提高多核利用率.首先,介绍了任务并行编程模型的基本编程接口和支持机制;然后,从3个角度,即并行性表达、数据管理和任务调度介绍任务并行编程模型的研究问题、困难和最新研究成果;最后展望了任务并行未来的研究方向.

    Abstract:

    Task parallel programming model is a widely used parallel programming model on multi-core platforms. With the intention of simplifying parallel programming and improving the utilization of multiple cores, this paper provides an introduction to the essential programming interfaces and the supporting mechanism used in task parallel programming models and discusses issues and the latest achievements from three perspectives: Parallelism expression, data management and task scheduling. In the end, some future trends in this area are discussed.

    参考文献
    [1] Jack D. The promise and perils of the coming multicore revolution and its impact. CTWatch Quarterly, 2007,3(1):1-33.
    [2] Asanovic K, Bodik R, Catanzaro BC, Gebis JJ, Husbands P, Keutzer K, Patterson DA, Plishker WL, Shalf J, Williams SW, Yelick KA. The landscape of parallel computing research: A view from berkley. Technical Report, UCB/EECS-2006-183, Berkeley: University of California, 2006.
    [3] Sutter H. The free lunch is over: A fundamental turn toward concurrency in software. Dr. Dobb''s Journal, 2005,30(3):202-210.
    [4] An H, Chen GL. Parallel programming models and languages. Ruanjian Xuebao/Journal of Software, 2002,13(1):118-124 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/20020117.htm
    [5] Blumofe RD, Joerg CF, Kuszmaul BC, Leiserson CE, Randall KH, Zhou YL. Cilk: An efficient multithreaded runtime system. In: Proc. of the 5th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming (PPoPP). 1995. 207-216. [doi: 10. 1145/209936.209958]
    [6] Frigo M, Leiserson CE, Randall KH. The implementation of the Cilk-5 multithreaded language. In: Proc. of the ACM SIGPLAN''98 Conf. on Programming Language Design and Implementation (PLDI). 1998. 212-223. [doi: 10.1145/277650. 277725]
    [7] Intel Inc. Intel? Cilk++ SDK programmer''s guide. 2009. http://software.intel.com/en-us/articles/download-intel-cilk-sdk/
    [8] Intel. Intel(R) threading building blocks 3.0 reference manul. 2010. http://www.threadingbuildingblocks.org
    [9] Leijen D, Schulte W. The design of a task parallel library. In: Proc. of the 24th ACM SIGPLAN Conf. on Object Oriented Programming Systems Languages and Applications (OOPSLA). 2009. 227-242. [doi: 10.1145/1640089.1640106]
    [10] Lea D. A Java fork/join framework. In: Proc. of the ACM 2000 Conf. on Java Grande. 2000. 36-43. [doi: 10.1145/337449.337465]
    [11] Charles P, Grothoff C, Saraswat VA, Donawa C, Kielstra A, Ebcioglu K, von Praun C, Sarkar V. X10: An object-oriented approach to non-uniform cluster computing. In: Proc. of the 20th Annual ACM SIGPLAN Conf. on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA). 2005. 519-538. [doi: 10.1145/1094811.1094852]
    [12] Cavé V, Zhao JS, Shirako J, Sarkar V. Habanero-Java: The new adventures of old X10. In: Proc. of the 9th Int''l Conf. on the Principles and Practice of Programming in Java (PPPJ). 2011. [doi: 10.1145/2093157.2093165]
    [13] OpenMP Application Program Interface. Version 3.0. 2008.
    [14] Cavé V, Budimlic Z, Sarkar V. Comparing the usability of library vs. language approaches to task parallelism. In: Proc. of the Workshop on Evaluation and Usability of Programming Languages and Tools (PLATEAU). 2010. [doi: 10.1145/1937117.1937126]
    [15] Blumofe RD, Leiserson CE. Scheduling multithreaded computations by work stealing. Journal of the ACM, 1999,46(5):720-748. [doi: 10.1145/324133.324234]
    [16] Blelloch RD, Leiserson CE. Space-Efficient scheduling of multithreaded computations. In: Proc. of the 25th Annual ACM Symp. on Theory of Computing. 1993. 362-371. [doi: 10.1145/167088.167196]
    [17] Guo Y, Barik R, Raman R, Sarkar V. Work-First and help-first scheduling policies for async-finish task parallelism. In: Proc. of the 2009 IEEE Int''l Symp. on Parallel and Distributed Processing (IPDPS). 2009. 1-12. [doi: 10.1109/IPDPS.2009.5161079]
    [18] Burton FW, Sleep MR. Executing functional programs on a virtual tree of processors. In: Proc. of the Conf. on Functional Programming Languages and Computer Architecture. 1981. 187-194. [doi: 10.1145/800223.806778]
    [19] Addison C, LaGrone J, Huang L, Chapman B. OpenMP 3.0 tasking implementation in OpenUH. In: Proc. of the Open64 Workshop in Conjunction with the Int''l Symp. on Code Generation and Optimization. 2009. http://www.capsl.udel.edu/conferences/open64/ 2009/
    [20] Tardieu O, Wang HC, Lin HB. A work-stealing scheduler for X10''s task parallelism with suspension. In: Proc. of the 17th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming (PPoPP). 2012. 267-276. [doi: 10.1145/2370036.2145850]
    [21] Shirako J, Sarkar V. Hierarchical phasers for scalable synchronization and reduction in dynamic parallelism. In: Proc. of the 24th IEEE Int''l Parallel and Distributed Processing Symp. (IPDPS). 2010. 1-12. [doi: 10.1109/IPDPS.2010.5470414]
    [22] Frigo M, Halpern P, Leiserson CE, Lewin-Berlin S. Reducers and other Cilk++ hyperobjects. In: Bender MA, ed. Proc. of the 21st ACM Symp. on Parallelism in Algorithms and Architectures (SPAA). New York: ACM Press, 2009. 79-90.
    [23] Leiserson CE, Schardl TB. A work-efficient parallel breadth-first search algorithm (or how to cope with the nondeterminism of reducers). In: Proc. of the 22nd ACM Symp. on Parallelism in Algorithms and Architectures (SPAA). 2010. 303-314. [doi: 10. 1145/1810479.1810534]
    [24] Agrawal K, Leiserson CE, Sukha J. Helper locks for fork-join parallel programming. In: Proc. of the 15th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming (PPoPP). 2010. 245-256. [doi: 10.1145/1837853.1693487]
    [25] Jr Halstead RH. Implementation of multilisp: Lisp on a multiprocessor. In: Proc. of the Symp. on Lisp and Functional Programming. 1984. 9-17. [doi: 10.1145/800055.802017]
    [26] Mohr E, Kranz DA, Jr Halstead RH. Lazy task creation: A technique for increasing the granularity of parallel programs. In: Proc. of the ''90 ACM Conf. on LISP and Functional Programming (LFP''90). New York: ACM Press, 1990. 185-197. [doi: 10.1145/91556. 91631]
    [27] Squillante MS, Nelson RD. Analysis of task migration in shared-memory multiprocessor scheduling. In: Proc. of the ''91 ACM SIGMETRICS Conf. on Measurement and Modeling of Computer Systems. 1991. 143-155. [doi: 10.1145/107971.107987]
    [28] Fatourou P, Spriakis P. Efficient scheduling of strict multithreaded computations. Theory of Computing Systems Journal, 2000, 33(3):173-332. [doi: 10.1007/s002240010002]
    [29] Berenbrink P, Friedetzky T, Goldberg LA. The natural work stealing algorithm is stable. In: Proc. of the IEEE Symp. on Foundations of Computer Science. 2001. 178-187. [doi: 10.1137/S0097539701399551]
    [30] Squillante MS, Lazowska ED. Using processor-cache affinity information in shared-memory multiprocessor scheduling. IEEE Trans. on Parallel and Distributed Systems, 1993,4(2):131-143. [doi: 10.1109/71.207589]
    [31] Acar UA, Blelloch GE, Blumofe RD. The data locality of work stealing. In: Proc. of the 12th Annual ACM Symp. on Parallel Algorithms and Architectures (SPAA). 2000. 1-12. [doi: 10.1145/341800.341801]
    [32] Hamidzadeh B, Lilja DJ. Dynamic scheduling strategies for shared-memory multiprocessors. In: Proc. of the 16th Int''l Conf. on Distributed Computing. 1996. 208-215. [doi: http://dx.chinadoi.cn/10.3760/cma.j.issn.1674-2907.2010.17.004]
    [33] Lo M, Dandamudi SP. Performance of hierarchical load sharing in heterogeneous distributed systems. In: Yetongnon K, ed. Proc. of the Int''l Conf. on Parallel and Distributed Computing Systems. Dijon: ISCA, 1996. 370-377.
    [34] Bender MA, Rabin MO. Online scheduling of parallel programs on heterogeneous systems with applications to Cilk. In: Proc. of the Theory of Computing Systems Special Issue on SPAA. 2002. [doi: 10.1007/s00224-002-1055-5]
    [35] Sanchez D, Yoo RM, Kozyrakis C. Flexible architectural support for fine-grain scheduling. In: Proc. of the 15th Edition of ASPLOS on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 2010. 311-322. [doi: 10.1145/ 1736020.1736055]
    [36] Mohr E, Kranz DA, Jr HalsteadRH, Lazy task creation: A technique for increasing the granularity of parallel programs. IEEE Trans. on Parallel and Distributed Systems, 1991,2(3):264-280. [doi: 10.1109/71.86103]
    [37] Loidl HW, Hammond K. On the granularity of divide-and-conquer parallelism. In: Turner DN, ed. Proc. of the Glasgow Workshop on Functional Programming. Ullapool: Springer-Verlag, 1995. 8-10.
    [38] Duran A, Corbalán J, Ayguadé E. Evaluation of OpenMP task scheduling strategies. In: Eigenmann R, Supinski BRD, eds. Proc. of the 4th Int''l Workshop on OpenMP (IWOMP). Berlin: Springer-Verlag, 2008. 100-110.
    [39] Guojing C, Sreedhar K, Sriram K, Doug L, Vijay S, Tong W. Solving large, irregular graph problems using adaptive work-stealing. In: Proc. of the 37th Int''l Conf. on Parallel Processing. 2008. 536-545. [doi: 10.1109/ICPP.2008.88]
    [40] Chen SM, Gibbons PB, Kozuch M, Liaskovitis V, Ailamaki A, Blelloch GE, Falsofi B, Fix L, Hardavellas N, Mowry TC, Wilkerson C. Scheduling threads for constructive cache sharing on CMPs. In: Proc. of the 19th Annual ACM Symp. on Parallel Algorithms and Architectures (SPAA). 2007. 105-115. [doi: 10.1145/1248377.1248396]
    [41] Duran A, Corbalán J, Ayguadé E. An adaptive cut-off for task parallelism. In: Proc. of the 2008 ACM/IEEE Conf. on Supercomputing. 2008. [doi: 10.1109/SC.2008.5213927]
    [42] Acar UA, Charguéraud A, Rainey M. Oracle scheduling: Controlling granularity in implicitly parallel languages. In: Proc. of the 2011 ACM Int''l Conf. on Object Oriented Programming Systems Languages and Applications (OOPSLA). 2011. 499-518. [doi: 10. 1145/2076021.2048106]
    [43] Wang L, Cui HM, Duan YL, Lu F, Feng XB, Yew PC. An adaptive task creation strategy for work-stealing scheduling. In: Proc. of the 8th Annual IEEE/ACM Int''l Symp. on Code Generation and Optimization. 2010. [doi: 10.1145/1772954.1772992]
    [44] Tzannes A, Caragea GC, Barua R. Lazy binary-splitting: A run-time adaptive work-stealing scheduler. In: Proc. of the 15th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming (PPoPP). 2010. 179-190. [doi: 10.1145/1693453.1693479]
    [45] Lee ITA, Boyd-Wickizer S, Huang ZY, Leiserson CE. Using memory mapping to support cactus stacks in work-stealing runtime systems. In: Proc. of the 19th Int''l Conf. on Parallel Architectures and Compilation Techniques (PACT). 2010. 411-420. [doi: 10. 1145/1854273.1854324]
    [46] Zhao JS, Shirako J, Nandivada VK, Sarkar V. Reducing task creation and termination overhead in explicitly parallel programs. In: Proc. of the 19th Int''l Conf. on Parallel Architectures and Compilation Techniques (PACT). 2010. 169-180. [doi: 10.1145/1854273. 1854298]
    [47] Robison A, Voss M, Kukanov A. Optimization via reflection on work stealing in TBB. In: Proc. of the IEEE Int''l Symp. on Parallel and Distributed Processing (IPDPS). 2008. 1-8. [doi: 10.1109/IPDPS.2008.4536188]
    [48] Chen Q, Huang ZY, Guo MY, Zhou JY. CAB: Cache aware bi-tier task-stealing in multi-socket multi-core architecture. In: Proc. of the Int''l Conf. on Parallel Processing (ICPP). 2011. 722-732. [doi: 10.1109/ICPP.2011.32]
    [49] Olivier SL, Porterfield AK, Wheeler KB, Prins JF. Scheduling task parallelism on multi-socket multicore systems. In: Proc. of the Int''l Workshop on Runtime and Operating Systems for Supercomputers (ROSS). 2011. [doi: 10.1145/1988796.1988804]
    [50] Guo Y, Zhao JS, Cavé V, Sarkar V. SLAW: A scalable locality-aware adaptive work-stealing scheduler. In: Proc. of the 24th IEEE Int''l Parallel and Distributed Processing Symp. (IPDPS). 2010. 1-12. [doi: 10.1145/1693453.1693504]
    [51] Dinan J, Larkins DB, Sadayappan P, Krishnamoorthy S, Nieplocha J. Scalable work stealing. In: Proc. of the ACM/IEEE Conf. on High Performance Computing. 2009. [doi: 10.1145/1654059.1654113]
    [52] Saraswat VA, Kambadur P, Kodali S, Grove D, Krishnamoorthy S. Lifeline-Based global load balancing. In: Proc. of the 16th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming (PPoPP 2011). 2011. 201-212. [doi: 10.1145/1941553. 1941582]
    [53] Cao YJ, Qian DP, Wu WG, Dong XS. Adaptive scheduling algorithm based on dynamic core-resource partitions for manycore processor systems. Ruanjian Xuebao/Journal of Software, 2012,23(2):240-252 (in Chinese with English abstract). http://www.jos.org.cn/ 1000-9825/4141.htm [doi: 10.3724/SP.J.1001.2012.04141]
    [54] Long GP, Zhang JC, Fan DR. Architectural support and evaluation of Cilk language on many-core architectures. Chinese Journal of Computers, 2008,31(11):1975-1985 (in Chinese with English abstract). [doi: http://dx.chinadoi.cn/10.3321/j.issn: 0254-4164.2008.11.012]
    [55] Li Z, Certner O, Duato J, Temam O. Scalable hardware support for conditional parallelization. In: Proc. of the 19th Int''l Conf. on Parallel Architectures and Compilation Techniques (PACT). 2010. [doi: 10.1145/1854273.1854297]
    [56] Michael MM, Vechev MT, Saraswat VA. Idempotent work stealing. In: Proc. of the 14th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming (PPoPP). 2009. 45-54. [doi: 10.1145/1594835.1504186]
    [57] Chase D, Lev Y. Dynamic circular work-stealing d-e-que. In: Proc. of the 17th Annual ACM Symp. on Parallelism in Algorithms and Architectures (SPAA). 2005. 21-28. [doi: 10.1145/1073970.1073974]
    [58] Chen L, Huo W, Long XJ, Tang SL. Parallel programming languages on multi-core and many-core architectures. Information Technology Letter, 2012,10(1):23-40 (in Chinese with English abstract).
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

王蕾,崔慧敏,陈莉,冯晓兵.任务并行编程模型研究与进展.软件学报,2013,24(1):77-90

复制
分享
文章指标
  • 点击次数:11387
  • 下载次数: 30206
  • HTML阅读次数: 0
  • 引用次数: 0
历史
  • 收稿日期:2012-07-23
  • 最后修改日期:2012-10-19
  • 在线发布日期: 2012-12-29
文章二维码
您是第19922100位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号