硬件加速功能验证问题的DAG划分算法
作者:
作者简介:

何天祥(1998-), 男, 硕士生, 主要研究领域为图计算, 图划分;肖正(1981-), 男, 博士, 副教授, 博士生导师, CCF专业会员, 主要研究领域为协同优化, 并行分布式系统, 高性能计算, 智能信息处理, 大数据;陈岑(1985-), 男, 博士, 主要研究领域为并行与分布式计算, 机器学习, 深度学习;刘楚波(1988-), 男, 博士, 副教授, 博士生导师, CCF专业会员, 主要研究领域为并行分布式计算, 博弈论, 体系结构, 人工智能;李肯立(1971-), 男, 博士, 教授, 博士生导师, CCF会士, 主要研究领域为并行分布式处理, 超级计算与云计算, 面向大数据和人工智能的高效能计算

通讯作者:

李肯立, E-mail: lkl@hnu.edu.cn

中图分类号:

TP301

基金项目:

国家自然科学基金(61772182, 61802032)


DAG Partition Algorithm for Hardware Accelerated Function Verification
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [28]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    功能验证是超大规模集成电路(very large scale integration, VLSI)设计的一个基本环节. 随着超大规模电路的普及与发展, 在单处理器上对整个电路进行功能验证在可行性和效率上都存在较大的缺陷. 基于硬件加速器的功能验证是将整个电路划分成若干个规模更小的子电路; 然后在多个硬件处理器上并行的执行功能验证. 当电路划分结果的并行性较优时可提高功能验证的效率, 缩短时间周期. 类似电路设计中的其他划分问题, 用于硬件加速功能验证的电路划分问题可以被抽象成图划分问题. 相较于传统图划分问题, 硬件加速功能验证的划分问题还需要保证较小的模拟深度和较高的调度并行性. 为了满足硬件加速功能验证的划分需求, 提出了一种基于传统多级图划分策略的有效算法. 该算法结合调度思想, 利用电路的关键路径信息和时序信息, 将硬件加速功能验证问题转化为有向无环图的多级划分问题. 随机电路网表数据的实验结果表明, 所构造的算法可以有效的减少关键路径长度并且不会引起切边数的增长恶化.

    Abstract:

    Functional verification is a basic step in VLSI design. With the popularity and development of VLSI, the feasibility and efficiency of functional verification of the whole circuit on a single processor are greatly deficient. The functional verification based on hardware accelerator divides the whole circuit into several smaller sub circuits. When the parallelism of circuit partitioning is better, the time cycle of function verification can be accelerated. Similar to other partitioning problems in circuit design, the circuit partitioning problem for hardware accelerated function verification can be abstracted into graph partitioning problem. In order to meet the requirements of hardware accelerated functional verification, an effective algorithm based on traditional multi-level graph partition strategy is proposed. The algorithm combines the idea of scheduling, and uses the critical path information and timing information of the circuit. The problem of hardware accelerated function verification is transformed into the problem of multi-level partition of directed acyclic graph. The experimental results of random circuit netlist data show that the proposed algorithm can effectively reduce the critical path length and does not cause the growth and deterioration of the number of cut edges.

    参考文献
    [1] Mammo BW. Reining in the functional verification of complex processor designs with automation, prioritization, and approximation [Ph.D. Thesis]. Ann Arbor: The University of Michigan, 2017.
    [2] Blank T. A survey of hardware accelerators used in computer-aided design. IEEE Design & Test of Computers, 1984, 1(3): 21–39. [doi: 10.1109/MDT.1984.5005647]
    [3] Schubert KD. POWER7: Verification challenge of a multi-core processor. In: Proc. of the 2009 Int’l Conf. on Computer-Aided Design. San Jose: ACM, 2009. 809–812.
    [4] Chandy JA, Banerjee P. A parallel circuit-partitioned algorithm for timing driven cell placement. In: Proc. of the Int’l Conf. on Computer Design VLSI in Computers and Processors. Austin: IEEE, 1997. 621–627.
    [5] Gao T, Vaidya PM, Liu CL. A new performance driven placement algorithm. In: Proc. of the 1991 IEEE Int’l Conf. on Computer-Aided Design Digest of Technical Papers. Santa Clara: IEEE, 1991. 44–47.
    [6] Hill D. Alternative strategies for applying min-cut to VLSI placement. In: Proc. of the 1988 IEEE Int’l Conf. on Computer Design: VLSI. Rye Brook: IEEE, 1988. 440–444.
    [7] Mak WK. Faster min-cut computation in unweighted hypergraphs/circuit netlists. In: Proc. of the 2005 IEEE VLSI-TSA Int’l Symp. on VLSI Design, Automation and Test, 2005. (VLSI-TSA-DAT). Hsinchu: IEEE, 2005. 67–70.
    [8] Allam MW, Vannelli A, Elmasry MI. A clustering algorithm for circuit partitioning. In: Proc. of the CCECE1997. Canadian Conf. on Electrical and Computer Engineering. Engineering Innovation: Voyage of Discovery. Conf. Proc. St. John’s: IEEE, 1997. 12–14.
    [9] Kolar D, Puksec JD, Branica I. VLSI circuit partition using simulated annealing algorithm. In: Proc. of the 12th IEEE Mediterranean Electrotechnical Conf. (IEEE Cat. No. 04CH37521). Dubrovnik: IEEE, 2004. 205–208.
    [10] Sipakoulis GC, Karafyllidis I, Thanailakis A. Genetic partitioning and placement for VLSI circuits. In: Proc. of the 6th IEEE Int’l Conf. on Electronics, Circuits and Systems (Cat. No. 99EX357). Paphos: IEEE, 1999. 1647–1650.
    [11] 郭文忠, 陈国龙, Xiong NX, 彭少君. 求解VLSI电路划分问题的混合粒子群优化算法. 软件学报, 2011, 22(5): 833-842. http://www.jos.org.cn/1000-9825/3980.htm
    Guo WZ, Chen GL, Xiong NX, Peng SJ. Hybrid particle swarm optimization algorithm for VLSI circuit partitioning. Ruan Jian Xue Bao/Journal of Software, 2011, 22(5): 833-842 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/3980.htm
    [12] Fiduccia CM, Mattheyses BM. A linear-time heuristic for improving network partitions. In: Proc. of the 19th Design Automation Conf. Las Vegas: IEEE, 1982. 175−181.
    [13] Karypis G, Kumar V. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal of Scientific Computing, 1998, 20(1): 359–392. [doi: 10.1137/S1064827595287997]
    [14] Karypis G, Kumar V. Analysis of multilevel graph partitioning. In: Proc. of the 1995 ACM/IEEE Conf. on Supercomputing. San Diego: IEEE, 1995. 29.
    [15] Kernighan BW, Lin S. An efficient heuristic procedure for partitioning graphs. The Bell System Technical Journal, 1970, 49(2): 291–307. [doi: 10.1002/j.1538-7305.1970.tb01770.x]
    [16] Moffitt MD, Sustik MA, Villarrubia PG. Robust partitioning for hardware-accelerated functional verification. In: Proc. of the 48th Design Automation Conf. San Diego: ACM, 2011. 854–859.
    [17] Topcuoglu H, Hariri S, Wu MY. Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Transactions on Parallel and Distributed Systems, 2002, 13(3): 260–274. [doi: 10.1109/71.993206]
    [18] Xie GQ, Li RF, Li KQ. Heterogeneity-driven end-to-end synchronized scheduling for precedence constrained tasks and messages on networked embedded systems. Journal of Parallel and Distributed Computing, 2015, 83: 1–12. [doi: 10.1016/j.jpdc.2015.04.005]
    [19] Kanemitsu H, Hanada M, Nakazato H. Clustering-based task scheduling in a large number of heterogeneous processors. IEEE Transactions on Parallel and Distributed Systems, 2016, 27(11): 3144–3157. [doi: 10.1109/TPDS.2016.2526682]
    [20] Liu CB, Li KL, Liang J, Li KQ. COOPER-SCHED: A cooperative scheduling framework for mobile edge computing with expected deadline guarantee. IEEE Trans. on Parallel and Distributed Systems, 2019.
    [21] Zhao S, Dai XT, Bate I, Burns A, Chang WL. DAG scheduling and analysis on multiprocessor systems: Exploitation of parallelism and dependency. In: Proc. of 2020 IEEE Real-time Systems Symp. Houston: IEEE, 2020. 128–140.
    [22] Karypis G, Aggarwal R, Kumar V, Shekhar S. Multilevel hypergraph partitioning: Applications in VLSI domain. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 1999, 7(1): 69–79. [doi: 10.1109/92.748202]
    [23] Karypis G, Kumar V. Multilevel k-way hypergraph partitioning. In: Proc. 1999 Design Automation Conf. (Cat. No. 99CH36361). New Orleans: IEEE, 1999. 343–348.
    [24] Herrmann J, Kho J, Uçar B, Kaya K, Çatalyürek ÜV. Acyclic partitioning of large directed acyclic graphs. In: Proc. of the 17th IEEE/ACM Int’l Symp. on Cluster, Cloud and Grid Computing (CCGRID). Madrid: IEEE, 2017. 371–380.
    [25] Herrmann J, Özkaya MY, Uçar B, Kaya K, Çatalyürek ÜV. Multilevel algorithms for acyclic partitioning of directed acyclic graphs. SIAM Journal on Scientific Computing, 2019, 41(4): A2117–A2145. [doi: 10.1137/18M1176865]
    [26] Sanders P, Schulz C. Engineering multilevel graph partitioning algorithms. In: Proc. of the 19th European Symp. on Algorithms. Saarbrücken: Springer, 2011. 469–480.
    [27] Davis TA, Hager WW, Kolodziej SP, et al. Algorithm 1003: Mongoose, a graph coarsening and partitioning library. ACM Transactions on Mathematical Software, 2020, 46(1): 7. [doi: 10.1145/3337792]
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

何天祥,肖正,陈岑,刘楚波,李肯立.硬件加速功能验证问题的DAG划分算法.软件学报,2022,33(9):3236-3248

复制
分享
文章指标
  • 点击次数:750
  • 下载次数: 2339
  • HTML阅读次数: 1918
  • 引用次数: 0
历史
  • 收稿日期:2021-03-12
  • 最后修改日期:2021-04-12
  • 在线发布日期: 2022-06-15
  • 出版日期: 2022-09-06
文章二维码
您是第19920704位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号