多核处理器并行程序的确定性重放研究
作者:
基金项目:

国家自然科学基金(61073011, 61133004); 国家高技术研究发展计划(863)(2012AA010902)


Deterministic Replay for Parallel Programs in Multi-Core Processors
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [38]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    多核处理器并行程序的确定性重放是实现并行程序调试的有效手段,对并行编程有重要意义.但由于多核架构下存在共享访存不同步问题,并行程序确定性重放的研究依然面临多方面的挑战,给并行程序的调试带来很大困难,严重影响了多核架构下并行程序的普及和发展.分析了多核处理器造成并行程序确定性重放难以实现的关键因素,总结了确定性重放的评价指标,综述了近年来学术界对并行程序确定性重放的研究.根据总结的评价指标,从纯软件方式和硬件支持方式对目前的确定性重放方法进行了分析与对比,并在此基础上对多核架构下并行程序的确定性重放未来的研究趋势和应用前景进行了展望.

    Abstract:

    The deterministic replay for parallel programs in multi-core processor systems is important for the debugging and dissemination of parallel programs, however, due to the difficulty in tackling unsynchronized accessing of shared memory in multiprocessors, industrial-level deterministic replay for parallel programs have not emerged yet. This paper analyzes non-deterministic events in multi-core processor systems and summarizes metrics of deterministic replay schemes. After studying the research for deterministic multi-core processor replay in recent years, this paper introduces the proposed deterministic replay schemes for parallel programs in multi-core processor systems, investigates characteristics of software-pure and hardware-assisted deterministic replay schemes, analyzes current researches and gives the prospects of deterministic replay for parallel programs in multi-core processor systems.

    参考文献
    [1] Asanović K, Bodik R, Demmel J, Keaveny T, Keutzer K, Kubiatowicz J, Morgan N, Patterson D, Sen K, Wawrzynek J, Wessel D,Yelick K. A view of the parallel computing landscape. Communications of the ACM, 2009,52(10):56-67. [doi: 10.1145/1562764.1562783]
    [2] Nightingale EB, Peek D, Chen PM, Flinn J. Parallelizing security checks on commodity hardware. In: Proc. of the 13th Int’l Conf.on Architectural Support for Programming Languages and Operating Systems. New York: ACM Press, 2008. 308-318. [doi:10.1145/1346281.1346321]
    [3] Zhai JD, Chen WG, Zheng WM. Phantom: Predicting performance of parallel applications on large-scale parallel machines using asingle node. In: Proc. of the 15th Annual Symp. on Principles and Practice of Parallel Programming. New York: ACM Press, 2010.305-314. [doi: 10.1145/1693453.1693493]
    [4] Lee DY, Wester B, Veeraraghavan K, Narayanasamy S, Chen PM, Flinn J. Respec: Efficient online multiprocessor replay viaspeculation and external determinism. In: Proc. of the 15th Int’l Conf. on Architectural Support for Programming Languages andOperating Systems. New York: ACM Press, 2010. 77-90. [doi: 10.1145/1736020.1736031]
    [5] Hower D, Hill MD. Rerun: Exploiting episodes for light weight memory race recording. In: Proc. of the 35th Int’l Symp. onComputer Architecture. Washington: IEEE Computer Society, 2008. 265-276. [doi: 10.1109/ISCA.2008.26]
    [6] Montesinos P, Ceze L, Torrellas J. DeLorean: Recording and deterministically replaying shared-memory multiprocessor executioneffciently. In: Proc. of the 35th Int’l Symp. on Computer Architecture. 2008. 289-300. [doi: 10.1109/ISCA.2008.36]
    [7] Chen YJ, Hu WW, Chen TS, Wu RY. LReplay: A pending period based deterministic replay scheme. In: Proc. of the 37th Int’lSymp. on Computer Architecture. New York: ACM Press, 2010. 187-197. [doi: 10.1145/1815961.1815985]
    [8] Dunlap G, King S, Cinar S, Basrai M, Chen P. ReVirt: Enabling intrusion analysis through virtual-machine logging and replay. In:Proc. of the 5th USENIX Symp. on Operating System Design and Implementation. New York: ACM Press, 2002. 211-224. [doi:10.1145/844128.844148]
    [9] Xu M, Malyugin V, Sheldon J, Venkitachalam G, Weissman B. ReTrace: Collecting execution trace with virtual machinedeterministic replay. In: Proc. of the 2007 Workshop on Modeling, Benchmarking and Simulation. 2007.
    [10] Netzer RHB, Miller BP. What are race conditions?: Some issues and formalizations. ACM Letters on Programming Languages andSystems, 1992,1(1):74-88. [doi: 10.1145/130616.130623]
    [11] Ronsse M, Bosschere KD. RecPlay: A full integrated practical record/replay system. ACM Trans. on Computer Systems, 1999,17(2):133-152. [doi: 10.1145/312203.312214]
    [12] Su M, Chen Y, Gao X. A general method to make multi-clock system deterministic. In: Proc. of the Conf. on Design, Automationand Test in Europe. 2010. 1480-1485. [doi: 10.1109/DATE.2010.5457045]
    [13] Narayanasamy S, Pereira C, Calder B. Recording shared memory dependencies using strata. In: Proc. of the 12th Int’l Conf. onArchitectural Support for Programming Languages and Operating Systems. New York: ACM Press, 2006. 229-240. [doi:10.1145/1168857.1168886]
    [14] Xu M, Bodik R, Hill MD. A “flight data recorder” for enabling full-system multiprocessor deterministic replay. In: Proc. of the30th Int’l Symp. on Computer Architecture. New York: ACM Press, 2003. 122-135. [doi: 10.1145/859618.859633]
    [15] Xu M, Hill MD, Bodik R. A regulated transitive reduction (RTR) for longer memory race recording. In: Proc. of the 12th Int’l Conf.on Architectural Support for Programming Languages and Operating Systems. New York: ACM Press, 2006. 49-60. [doi: 10.1145/1168857.1168865]
    [16] LeBlanc TJ, Mellor-Crummey JM. Debugging parallel programs with instant replay. IEEE Trans. on Computers, 1987,36(4):471-482. [doi: 10.1109/TC.1987.1676929]
    [17] Park S, Xiong WW, Yin ZN, Kaushik R, Lee KH, Lu SH, Zhou YY. PRES: Probabilistic replay with execution sketching onmultiprocessors. In: Proc. of the 22nd ACM Symp. Operating Systems Principles. New York: ACM Press, 2009. 177-192. [doi:10.1145/1629575.1629593]
    [18] Olszewski M, Ansel J, Amarasinghe S. Kendo: Efficient deterministic multithreading in software. In: Proc. of the 2009 Int’l Conf.on Architectural Support for Programming Languages and Operating Systems. New York: ACM Press, 2009, 97-108. [doi:10.1145/1508244.1508256]
    [19] Altekar G, Stoica I. ODR: Output-Deterministic replay for multicore debugging. In: Proc. of the 22nd ACM Symp. on OperatingSystems Principles. New York: ACM Press, 2009. 193-206. [doi: 10.1145/1629575.1629594]
    [20] Montesinos P, Hicks M, King S, Torrellas J. Capo: A software-hardware interface for practical deterministic multiprocessor replay.In: Proc. of the 14th Int’l Conf. on Architectural Support for Programming Languages and Operating Systems. 2009. [doi:10.1145/1508244.1508254]
    [21] Narayanasamy S, Pokam G, Calder B. BugNet: Continuously recording program execution for deterministic replay debugging. In:Proc. of the 31st Int’l Symp. on Computer Architecture. Washington: IEEE Computer Society, 2005. 284-295. [doi: 10.1109/ISCA.2005.16]
    [22] Lamport L. Time, clocks and the ordering of events in a distributed system. Communications of the ACM, 1978,21(7):558-565.[doi: 10.1145/359545.359563]
    [23] Chen Y, Chen T, Hu W. Global clock, physical time order and pending period analysis in multiprocessor systems. The ComputingResearch Repository, 2009.
    [24] Dunlap GW, Lucchetti DG, Fetterman MA, Chen PM. Execution replay of multiprocessor virtual machines. In: Proc. of the 4thACM SIGPLAN/SIGOPS Int’l Conf. on Virtual execution environments. New York: ACM Press, 2008. 121-130. [doi: 10.1145/1346256.1346273]
    [25] Narayanasamy S, Pereira C, Patil H, Cohn R, Calder B. Automatic logging of operating system effects to guide application-levelarchitecture simulation. In: Proc. the 2006 Joint Int’l Conf. on Measurement and Modeling of Computer Systems. New York: ACMPress, 2006. 216-227. [doi: 10.1145/1140277.1140303]
    [26] Bhansali S, Chen WK, deJong S, Edwards A, Murray R, Drinić M, Mihõcka D, Chau J. Framework for instruction-level tracing andanalysis of program executions. In: Proc. of the 2nd Int’l Conf. on Virtual Execution Environments. New York: ACM Press, 2006.154-163. [doi: 10.1145/1134760.1220164]
    [27] Georges A, Christiaens M, Ronsse M, Bosschere KD. Jarec: A portable record/replay environment for multi-threaded Javaapplications. Software-Practice & Experience, 2004,34(6):523-547. [doi: 10.1002/spe.579]
    [28] Bacon D, Goldstein S. Hardware-Assisted replay of multiprocessor programs. In: Proc. of the Workshop on Parallel and DistributedDebugging. New York: ACM Press, 1991. 194-206. [doi: 10.1145/122759.122777]
    [29] Devietti J, Lucia B, Oskin M, Ceze L. DMP: Deterministic shared-memory multiprocessing. In: Proc. of the 14nd Int’l Conf. onArchitectural Support for Programming Languages and Operating Systems. New York: ACM Press, 2009. 85-96. [doi: 10.1145/1508244.1508255]
    [30] Prvulovic M. CORD: Cost-effective (and nearly overhead-free) order recording and data race detection. In: Proc. of the 12th IEEESymp. on High-Performance Computer Architecture. 2006. 232-243. [doi: 10.1109/HPCA.2006.1598132]
    [31] Prvulovic M, Torrellas J. ReEnact: Using thread-level speculation mechanisms to debug data races in multithreaded codes. In: Proc.of the 30th Annual Int’l Symp. on Computer Architecture. New York: ACM Press, 2003. 110-121. [doi: 10.1145/859618.859632]
    [32] Devietti J, Nelson J, Bergan T, Ceze L, Grossman D. Rcdc: A relaxed consistency deterministic computer. In: Proc. of the 16th Int’lConf. on Architectural Support for Programming Languages and Operating Systems. New York: ACM Press, 2011. 67-78. [doi:10.1145/1950365.1950376]
    [33] Hower DR, Dudnik P, Hill MD, Wood DA. Calvin: Deterministic or not? Free will to choose. In: Proc. of the 17th IEEE Int’l Symp.on High Performance Computer Architecture. 2011. 333-334. [doi: 10.1109/HPCA.2011.5749741]
    [34] Voskuilen G, Ahmad F, Vijaykumar TN. Timetraveler: Exploiting acyclic races for optimizing memory race recording. In: Proc. ofthe 37th Annual Int’l Symp. on Computer Architecture. New York: ACM Press, 2010. 198-209. [doi: 10.1145/1815961.1815986]
    [35] Netzer R. Optimal tracing and replay for debugging shared-memory parallel programs. In: Proc. of the Workshop on Parallel andDistributed Debugging. New York: ACM Press, 1993. 1-11. [doi: 10.1145/174267.174268]
    [36] Hammond L, Wong V, Chen M, Carlstrom BD, Davis JD, Hertzberg B, Prabhu MK, Wijaya H, Kozyrakis C, Olukotun K.Transactional memory coherence and consistency. In: Proc. of the 31st Annual Int’l Symp. on Computer Architecture. Washington:IEEE Computer Society, 2004. 102. [doi: http://doi.acm.org/10.1145/1028176.1006711]
    [37] Narayanasamy S, Wang Z, Tigani J, Edwards A, Calder B. Automatically classifying benign and harmful data races using replayanalysis. In: Proc. of the 2007 ACM SIGPLAN Conf. on Programming Language Design and Implementation. New York: ACMPress, 2007. 22-31. [doi: 10.1145/1250734.1250738]
    [38] Sorin DJ, Martin MMK, Hill MD, Wood DA. SafetyNet: Improving the availability of shared memory multiprocessors with globalcheckpoint/recovery. In: Proc. of the 29th Annual Int’l Symp. on Computer Architecture. Washington: IEEE Computer Society,2002. 123-134. [doi: 10.1109/ISCA.2002.1003568]
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

高岚,王锐,钱德沛.多核处理器并行程序的确定性重放研究.软件学报,2013,24(6):1390-1402

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2012-07-01
  • 最后修改日期:2013-02-26
  • 在线发布日期: 2013-04-01
文章二维码
您是第19727156位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号