Survey on Performance Analysis, Optimization, and Applications of Heterogeneous Fusion Processors
Author:
Affiliation:

Fund Project:

National Key Research and Development Program of China (2016YFB0200100); National Natural Science Foundation of China (61732014, 61722208, 61802412)

  • Article
  • | |
  • Metrics
  • |
  • Reference [48]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    With the development of heterogeneous computing technology, heterogeneous fusion processors, such as CPU-GPU integrated processors, have been fully developed in recent years, and arouse attention from both academia and industry. The fusion of different devices has several advantages. For example, all devices share the same memory and can have fine-grained cooperation. However, many system programming challenges and optimization challenges have emerged. To take full advantage of the capacity of heterogeneous fusion processors, it is needed to utilize features of heterogeneous fusion processors such as shared memory, and to perform architecture optimizations to different devices according to different applications. The research work related to heterogeneous fusion processors is first analyzed and summarized. Second, the related work about performance analysis is introduced. Third, the optimizations on heterogeneous fusion processors are summarized. A summarization for the applications that utilize heterogeneous fusion processors is also provided. At last, the future directions are provided on heterogeneous fusion processors and conclusion is given.

    Reference
    [1] Foley D, Steinman M, Branover A, Smaus G, Asaro A, Punyamurtula S, Bajic L. AMD's ‘Llano’ fusion APU. In:Proc. of the Hot Chips, Vol.23. 2011. 1-38.
    [2] Intel. The compute architecture of Intel processor graphics Gen7.5. 2017. https://software.intel.com/sites/default/files/managed
    [3] Nikolskiy VP, Stegailov VV, Vecher VS. Efficiency of the Tegra K1 and X1 systems-on-chip for classical molecular dynamics. In:Proc. of the 2016 Int'l Conf. on High Performance Computing & Simulation (HPCS). Innsbruck, 2016. 682-689.
    [4] Vijayaraghavany T, Eckert Y, Loh GH, et al. Design and analysis of an APU for exascale computing. In:Proc. of the 2017 IEEE Int'l Symp. on High Performance Computer Architecture (HPCA). IEEE, 2017. 85-96.
    [5] Schulte MJ, Ignatowski M, Loh GH, et al. Achieving exascale capabilities through heterogeneous computing. IEEE Micro, 2015,35(4):26-36.
    [6] Colangelo P, Luebbers E, Huang R, et al. Application of convolutional neural networks on Intel Xeon processor with integrated FPGA. In:Proc. of the 2017 IEEE High Performance Extreme Computing Conf. (HPEC). IEEE, 2017. 1-7.
    [7] Zhang F. Research on workload analysis and optimizations on heterogeneous integrated architectures[Ph.D. Thesis]. Beijing:Tsinghua University, 2017(in Chinese with English abstract).
    [8] Zhang F, Zhai J, Chen W, He B, Zhang S. To co-run, or not to co-run:A performance study on integrated architectures. In:Proc. of the 2015 IEEE 23rd Int'l Symp. on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS). IEEE, 2015. 89-92.
    [9] Zhu Q, Wu B, Shen X, Shen L, Wang Z. Understanding co-run degradations on integrated heterogeneous processors. In:Proc. of the Int'l Workshop on Languages and Compilers for Parallel Computing. Cham:Springer-Verlag, 2014. 82-97.
    [10] Zhu Q, Wu B, Shen X, Shen K, Shen L, Wang Z. Understanding co-run performance on CPU-GPU integrated processors:Observations, insights, directions. Frontiers of Computer Science, 2017,11(1):130-146.
    [11] Zhang F, Zhai J, He B, Zhang S, Chen W. Understanding co-running behaviors on integrated CPU/GPU architectures. IEEE Trans. on Parallel and Distributed Systems, 2017,28(3):905-918.
    [12] Pandit P, Govindarajan R. Fluidic kernels:Cooperative execution of OpenCL programs on multiple heterogeneous devices. In:Proc. of the Annual IEEE/ACM Int'l Symp. on Code Generation and Optimization. ACM, 2014.
    [13] Stone JE, Gohara D, Shi G. OpenCL:A parallel programming standard for heterogeneous computing systems. Computing in Science & Engineering, 2010,12(3):66-73.
    [14] Zhu Q, Wu B, Shen X, Shen L, Wang Z. Co-run scheduling with power cap on integrated CPU-GPU systems. In:Proc. of the 2017 IEEE Int'l Parallel and Distributed Processing Symp. (IPDPS). IEEE, 2017. 967-977.
    [15] Garzón EM, Moreno JJ, Martínez JA. An approach to optimise the energy efficiency of iterative computation on integrated GPU-CPU systems. The Journal of Supercomputing, 2017,73(1):114-125.
    [16] Sanders J, Kandrot E. CUDA by Example:An Introduction to General-purpose GPU Programming. Addison-Wesley Professional, 2010.
    [17] Krishnan G, Bouvier D, Naffziger S. Energy-efficient graphics and multimedia in 28-nm Carrizo accelerated processing unit. IEEE Micro, 2016,36(2):22-33.
    [18] Doweck J, Kao WF, Lu AKY, et al. Inside 6th-generation Intel core:New microarchitecture code-named Skylake. IEEE Micro, 2017,37(2):52-62.
    [19] Boggs D, Brown G, Tuck N, Venkatraman KS. Denver:Nvidia's first 64-bit ARM processor. IEEE Micro, 2015,35(2):46-55.
    [20] Lee K, Lin H, Feng WC. Performance characterization of data-intensive kernels on AMD fusion architectures. Computer Science-Research and Development, 2013,28(2-3):175-184.
    [21] Dashti M, Fedorova A. Analyzing memory management methods on integrated CPU-GPU systems. ACM SIGPLAN Notices, 2017, 52(9):59-69.
    [22] Yang Y, Xiang P, Mantor M, Zhou H. CPU-assisted GPGPU on fused CPU-GPU architectures. In:Proc. of the 2012 IEEE 18th Int'l Symp. on High Performance Computer Architecture (HPCA). IEEE, 2012. 1-12.
    [23] Power J, Basu A, Gu J, Puthoor S, Beckmann BM, Hill MD, Reinhardt SK, Wood DA. Heterogeneous system coherence for integrated CPU-GPU systems. In:Proc. of the 46th Annual IEEE/ACM Int'l Symp. on Microarchitecture. ACM, 2013. 457-467.
    [24] Agarwal N, Nellans D, Ebrahimi E, Wenisch TF, Danskin J, Keckler SW. Selective GPU caches to eliminate CPU-GPU HW cache coherence. In:Proc. of the 2016 IEEE Int'l Symp. on High Performance Computer Architecture (HPCA). IEEE, 2016. 494-506.
    [25] Choi YK, Cong J, Fang Z, Hao Y, Reinman G, Wei P. A quantitative analysis on microarchitectures of modern CPU-FPGA platforms. In:Proc. of the 53rd Annual Design Automation Conf. ACM, 2016.
    [26] Cong J, Fang Z, Huang M, Wang L, Wu D. CPU-FPGA co-scheduling for big data applications. IEEE Design & Test, 2018,35(1):16-22.
    [27] Nichols B, Buttlar D, Farrell J. Pthreads Programming:A POSIX Standard for Better Multiprocessing. O'Reilly Media, Inc., 1996.
    [28] 28Dagum L, Menon R. OpenMP:An industry standard API for shared-memory programming. IEEE Computational Science and Engineering, 1998,5(1):46-55.
    [29] Daga M, Aji AM, Feng WC. On the efficacy of a fused CPU+ GPU processor (or APU) for parallel computing. In:Proc. of the 2011 Symp. on Application Accelerators in High-performance Computing (SAAHPC). IEEE, 2011. 141-149.
    [30] Spafford KL, Meredith JS, Lee S, Li D, Roth PC, Vetter JS. The tradeoffs of fused memory hierarchies in heterogeneous computing architectures. In:Proc. of the 9th Conf. on Computing Frontiers. ACM, 2012. 103-112.
    [31] Zakharenko V, Aamodt T, Moshovos A. Characterizing the performance benefits of fused CPU/GPU systems using FusionSim. In:Proc. of the Design, Automation & Test in Europe Conf. & Exhibition (DATE). IEEE, 2013. 685-688.
    [32] Zhang F, Wu B, Zhai J, He B, Chen W. FinePar:Irregularity-aware fine-grained workload partitioning on integrated architectures. In:Proc. of the 2017 IEEE/ACM Int'l Symp. on Code Generation and Optimization (CGO). IEEE, 2017. 27-38.
    [33] Zhang F, Liu W, Feng N, et al. Performance evaluation and analysis of sparse matrix and graph kernels on heterogeneous processors. CCF Trans. on High Performance Computing, 2019,1(2):131-143.
    [34] Mekkat V, Holey A, Yew PC, Zhai A. Managing shared last-level cache in a heterogeneous multicore processor. In:Proc. of the 22nd Int'l Conf. on Parallel Architectures and Compilation Techniques. IEEE, 2013. 225-234.
    [35] Said I, Fortin P, Lamotte JL, et al. Leveraging the accelerated processing units for seismic imaging:A performance and power efficiency comparison against CPUs and GPUs. The Int'l Journal of High Performance Computing Applications, 2018,32(6):819-837.
    [36] Dávila GP, Oliveira D, Navaux P, et al. Impact of workload distribution on energy consumption, performance, and reliability of heterogeneous devices. In:Proc. of the 201927th Euromicro Int'l Conf. on Parallel, Distributed and Network-based Processing (PDP). IEEE, 2019. 166-173.
    [37] Dávila GP. A performance, energy consumption and reliability evaluation of workload distribution on heterogeneous devices. 2019. https://www.lume.ufrgs.br/handle/10183/198499
    [38] Barik R, Kaleem R, Majeti D, Lewis BT, Shpeisman T, Hu C, Ni Y, Adl-Tabatabai AR. Efficient mapping of irregular C++ applications to integrated GPUs. In:Proc. of the Annual IEEE/ACM Int'l Symp. on Code Generation and Optimization. ACM, 2014.
    [39] Kaleem R, Barik R, Shpeisman T, Lewis BT, Hu C, Pingali K. Adaptive heterogeneous scheduling for integrated GPUs. In:Proc. of the 23rd Int'l Conf. on Parallel Architectures and Compilation. ACM, 2014. 151-162.
    [40] Tang S, He B, Zhang S, Niu Z. Elastic multi-resource fairness:balancing fairness and efficiency in coupled CPU-GPU architectures. In:Proc. of the Int'l Conf. for High Performance Computing, Networking, Storage and Analysis. IEEE, 2016.
    [41] Puthoor S, Aji AM, Che S, Daga M, Wu W, Beckmann 求椀挀愀琀椀漀渀?漀渀?栀攀琀攀爀漀最攀渀攀漀甀猀?瀀爀漀挀攀猀猀漀爀猀??倀愀爀愀氀氀攀氀??漀洀瀀甀琀椀渀最??? ???????????????戀爀?嬀??崀??椀甀?圀??嘀椀渀琀攀爀??????昀爀愀洀攀眀漀爀欀?昀漀爀?最攀渀攀爀愀氀?猀瀀愀爀猀攀?洀愀琀爀椀砀?洀愀琀爀椀砀?洀甀氀琀椀瀀氀椀挀愀琀椀漀渀?漀渀??倀唀猀?愀渀搀?栀攀琀攀爀漀最攀渀攀漀甀猀?瀀爀漀挀攀猀猀漀爀猀???漀甀爀渀愀氀?漀昀?倀愀爀愀氀氀攀氀?愀渀搀??椀猀琀爀椀戀甀琀攀搀??漀洀瀀甀琀椀渀最??? ?????????????戀爀?嬀??崀?娀栀愀渀最?????椀渀????娀栀愀椀????攀琀?愀氀???渀?愀搀愀瀀琀椀瘀攀?戀爀攀愀搀琀栀?昀椀爀猀琀?猀攀愀爀挀栀?愀氀最漀爀椀琀栀洀?漀渀?椀渀琀攀最爀愀琀攀搀?愀爀挀栀椀琀攀挀琀甀爀攀猀??吀栀攀??漀甀爀渀愀氀?漀昀?匀甀瀀攀爀挀漀洀瀀甀琀椀渀最??? ?????????????????????戀爀?嬀??崀?娀漀甀????吀愀渀最?匀??夀甀????攀琀?愀氀???匀圀??挀挀攀氀攀爀愀琀椀渀最?匀洀椀琀栀?圀愀琀攀爀洀愀渀?愀氀最漀爀椀琀栀洀?漀渀?挀漀甀瀀氀攀搀??倀唀??倀唀?愀爀挀栀椀琀攀挀琀甀爀攀???渀琀?氀??漀甀爀渀愀氀?漀昀?倀愀爀愀氀氀攀氀?倀爀漀最爀愀洀洀椀渀最??? ?????????????? ???戀爀?嬀??崀??爀攀礀琀愀最????一愀瘀愀甀砀?倀伀????椀洀愀??嘀???攀琀?愀氀??一漀渀?甀渀椀昀漀爀洀?搀漀洀愀椀渀?搀攀挀漀洀瀀漀猀椀琀椀漀渀?昀漀爀?栀攀琀攀爀漀最攀渀攀漀甀猀?愀挀挀攀氀攀爀愀琀攀搀?瀀爀漀挀攀猀猀椀渀最?甀渀椀琀猀???渀?倀爀漀挀??漀昀?琀栀攀??渀琀?氀??漀渀昀??漀渀?嘀攀挀琀漀爀?愀渀搀?倀愀爀愀氀氀攀氀?倀爀漀挀攀猀猀椀渀最???栀愀洀?匀瀀爀椀渀最攀爀??? ????? ???????戀爀?嬀??崀??漀?夀???愀洀猀栀攀搀??????漀漀渀?夀???眀愀渀最????倀愀爀欀?????倀唀一攀琀?刀攀瘀椀琀愀氀椀稀椀渀最??倀唀?愀猀?瀀愀挀欀攀琀?瀀爀漀挀攀猀猀椀渀最?愀挀挀攀氀攀爀愀琀漀爀???渀?倀爀漀挀??漀昀?琀栀攀?一匀????? ???????????戀爀?嬀??崀??栀愀渀最?夀????栀椀?吀夀???愀猀栀?戀愀猀攀搀?伀瀀攀渀?氀漀眀?瀀愀挀欀攀琀?挀氀愀猀猀椀昀椀挀愀琀椀漀渀?漀渀?栀攀琀攀爀漀最攀渀攀漀甀猀?猀礀猀琀攀洀?愀爀挀栀椀琀攀挀琀甀爀攀???渀?倀爀漀挀??漀昀?琀栀攀?? ????琀栀??渀琀?氀??漀渀昀??漀渀?唀戀椀焀甀椀琀漀甀猀?愀渀搀??甀琀甀爀攀?一攀琀眀漀爀欀猀????唀?一?????????? ?????  ?? ???戀爀?嬀??崀?娀栀甀????圀愀渀最????娀栀愀渀最?倀??攀琀?愀氀??倀愀爀愀氀氀攀氀?椀洀瀀氀攀洀攀渀琀愀琀椀漀渀猀?漀昀?昀爀愀洀攀?爀愀琀攀?甀瀀?挀漀渀瘀攀爀猀椀漀渀?愀氀最漀爀椀琀栀洀?甀猀椀渀最?伀瀀攀渀???漀渀?栀攀琀攀爀漀最攀渀攀漀甀猀?挀漀洀瀀甀琀椀渀最?搀攀瘀椀挀攀猀???甀氀琀椀洀攀搀椀愀?吀漀漀氀猀?愀渀搀??瀀瀀氀椀挀愀琀椀漀渀猀??? ????????????????????戀爀?嬀??崀??栀攀?匀???漀礀攀爀?????攀渀最????吀愀爀樀愀渀????匀栀攀愀昀昀攀爀??圀???攀攀?匀???匀欀愀搀爀漀渀????刀漀搀椀渀椀愀???戀攀渀挀栀洀愀爀欀?猀甀椀琀攀?昀漀爀?栀攀琀攀爀漀最攀渀攀漀甀猀?挀漀洀瀀甀琀椀渀最???渀?倀爀漀挀??漀昀?琀栀攀???????渀琀?氀?匀礀洀瀀??漀渀?圀漀爀欀氀漀愀搀??栀愀爀愀挀琀攀爀椀稀愀琀椀漀渀????匀圀???  ???????????  ??????????戀爀?嬀??崀?圀椀欀椀瀀攀搀椀愀???渀琀攀氀?最爀愀瀀栀椀挀猀?琀攀挀栀渀漀氀漀最礀??? ? ??栀琀琀瀀猀???攀渀?眀椀欀椀瀀攀搀椀愀?漀爀最?眀椀欀椀??渀琀攀氀开?爀愀瀀栀椀挀猀开吀攀挀栀渀漀氀漀最礀?戀爀?嬀? 崀??攀愀渀?????栀攀洀愀眀愀琀?匀???愀瀀刀攀搀甀挀攀?匀椀洀瀀氀椀昀椀攀搀?搀愀琀愀?瀀爀漀挀攀猀猀椀渀最?漀渀?氀愀爀最攀?挀氀甀猀琀攀爀猀???漀洀洀甀渀椀挀愀琀椀漀渀猀?漀昀?琀栀攀???????  ????????? ???????戀爀?嬀??崀??瀀愀挀栀攀??愀栀漀甀琀??吀栀攀??瀀愀挀栀攀??愀栀漀甀琀?倀爀漀樀攀挀琀??? ????栀琀琀瀀???洀愀栀漀甀琀?愀瀀愀挀栀攀?漀爀最??戀爀?嬀??崀??椀甀?圀???嘀椀渀琀攀爀?????渀?攀昀昀椀挀椀攀渀琀??倀唀?最攀渀攀爀愀氀?猀瀀愀爀猀攀?洀愀琀爀椀砀?洀愀琀爀椀砀?洀甀氀琀椀瀀氀椀挀愀琀椀漀渀?昀漀爀?椀爀爀攀最甀氀愀爀?搀愀琀愀???渀?倀爀漀挀??漀昀?琀栀攀?? ??????????琀栀??渀琀?氀?倀愀爀愀氀氀攀氀?愀渀搀??椀猀琀爀椀戀甀琀攀搀?倀爀漀挀攀猀猀椀渀最?匀礀洀瀀????????? ?????? ??????戀爀?嬀??崀?倀愀渀?圀???椀?娀??娀栀愀渀最?夀??攀琀?愀氀??吀栀攀?渀攀眀?栀愀爀搀眀愀爀攀?搀攀瘀攀氀漀瀀洀攀渀琀?琀爀攀渀搀?愀渀搀?琀栀攀?挀栀愀氀氀攀渀最攀猀?椀渀?搀愀琀愀?洀愀渀愀最攀洀攀渀琀?愀渀搀?愀渀愀氀礀猀椀猀???愀琀愀?匀挀椀攀渀挀攀?愀渀搀??渀最椀渀攀攀爀椀渀最??? ?????????????????戀爀?嬀??崀??椀渀?堀???夀甀??堀??匀瀀攀挀椀愀氀?椀猀猀甀攀?漀渀?最爀愀瀀栀?瀀爀漀挀攀猀猀椀渀最?吀攀挀栀渀椀焀甀攀猀?愀渀搀?愀瀀瀀氀椀挀愀琀椀漀渀猀???愀琀愀?匀挀椀攀渀挀攀?愀渀搀??渀最椀渀攀攀爀椀渀最??? ???????嬀搀漀椀?? ???  ??猀?? ??? ???  ????崀?戀爀?嬀??崀??栀攀渀最?夀???椀渀最?倀??圀愀渀最?吀??攀琀?愀氀??圀栀椀挀栀?挀愀琀攀最漀爀礀?椀猀?戀攀琀琀攀爀??攀渀挀栀洀愀爀欀椀渀最?爀攀氀愀琀椀漀渀愀氀?愀渀搀?最爀愀瀀栀?搀愀琀愀戀愀猀攀?洀愀渀愀最攀洀攀渀琀?猀礀猀琀攀洀猀???愀琀愀?匀挀椀攀渀挀攀?愀渀搀??渀最椀渀攀攀爀椀渀最??? ????????? ???????戀爀???蝎??螀???戀爀?嬀?崀????戀????葟獧?葓?綍?遒???獓?肕??癸孺娀?晘??螋嵥?????乮?晙?? ????tures. In:Proc. of the USENIX Annual Technical Conf. (USENIX ATC). 2020.
    [54] Doerksen M, Solomon S, Thulasiraman P. Designing APU oriented scientific computing applications in OpenCL. In:Proc. of the 2011 IEEE 13th Int'l Conf. on High Performance Computing and Communications (HPCC). IEEE, 2011. 587-592.
    [55] Ilgner RG, Davidson DB. A comparison of the FDTD algorithm implemented on an integrated GPU versus a GPU configured as a co-processor. In:Proc. of the 2012 Int'l Conf. on Electromagnetics in Advanced Applications (ICEAA). IEEE, 2012. 1046-1049.
    [56] Delorme MC, Abdelrahman TS, Zhao C. Parallel radix sort on the AMD fusion accelerated processing unit. In:Proc. of the 201342nd Int'l Conf. on Parallel Processing (ICPP). IEEE, 2013. 339-348.
    [57] Liu WF, Vinter B. Ad-Heap:An efficient heap data structure for asymmetric multicore processors. In:Proc. of the Workshop on General Purpose Processing Using GPUs. ACM, 2014.
    [58] Daga M, Nutter M, Meswani M. Efficient breadth-first search on a heterogeneous processor. In:Proc. of the 2014 IEEE Int'l Conf. on Big Data (Big Data). IEEE, 2014. 373-382.
    [59] Eberhart P, Said I, Fortin P, Calandra H. Hybrid strategy for stencil computations on the APU. In:Proc. of the 1st Int'l Workshop on High-performance Stencil Computations. Vienna, 2014. 43-49.
    [60] Liu WF, Vinter B. Speculative segmented sum for sparse matrix-vector multip??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

张峰,翟季冬,陈政,林甲灶,杜小勇.面向异构融合处理器的性能分析、优化及应用综述.软件学报,2020,31(8):2603-2624

Copy
Share
Article Metrics
  • Abstract:3842
  • PDF: 7283
  • HTML: 5339
  • Cited by: 0
History
  • Received:January 31,2019
  • Revised:April 09,2020
  • Online: May 26,2020
  • Published: August 06,2020
You are the first2044102Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063