面向异构融合处理器的性能分析、优化及应用综述

doi:10.13328/j.cnki.jos.006080

微信服务号

微信订阅号

2025年6月1日 7:50 星期日

首页 > 过刊浏览>2020年第31卷第8期 >2603-2624. DOI:10.13328/j.cnki.jos.006080

PDF HTML阅读 XML下载导出引用引用提醒

面向异构融合处理器的性能分析、优化及应用综述
DOI:
                        10.13328/j.cnki.jos.006080
                    
CSTR:
                        
                    
作者:
                        张峰张峰
数据工程与知识工程教育部重点实验室(中国人民大学), 北京 100872;中国人民大学 信息学院, 北京 100872
在期刊界中查找
在百度中查找
在本站中查找
翟季冬翟季冬
清华大学 计算机科学与技术系, 北京 100084
在期刊界中查找
在百度中查找
在本站中查找
陈政陈政
数据工程与知识工程教育部重点实验室(中国人民大学), 北京 100872;中国人民大学 信息学院, 北京 100872
在期刊界中查找
在百度中查找
在本站中查找
林甲灶林甲灶
北京大学 信息管理系, 北京 100871
在期刊界中查找
在百度中查找
在本站中查找
杜小勇杜小勇
数据工程与知识工程教育部重点实验室(中国人民大学), 北京 100872;中国人民大学 信息学院, 北京 100872
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:张峰(1988-),男,博士,副教授,CCF专业会员,主要研究领域为大数据管理系统,高性能计算;林甲灶(1984-),男,博士,助理研究员,主要研究领域为物联网,机器学习,大数据系统;翟季冬(1981-),男,博士,副教授,博士生导师,CCF专业会员,主要研究领域为高性能计算,并行程序优化,性能测试,云计算;杜小勇(1963-),男,博士,教授,博士生导师,CCF会士,主要研究领域为数据管理技术,语义网技术,智能信息检索技术;陈政(1999-),男,博士生,CCF学生会员,主要研究领域为大数据处理,高性能计算.
通讯作者:杜小勇,E-mail:duyong@ruc.edu.cn
中图分类号:
基金项目:国家重点研发计划（2016YFB0200100）；国家自然科学基金（61732014，61722208，61802412）

Survey on Performance Analysis, Optimization, and Applications of Heterogeneous Fusion Processors

Author:

ZHANG Feng
ZHANG Feng
Key Laboratory of Data Engineering and Knowledge Engineering(Renmin University of China), Ministry of Education, Beijing 100872, China;School of Information, Renmin University of China, Beijing 100872, China
在期刊界中查找
在百度中查找
在本站中查找
ZHAI Ji-Dong
ZHAI Ji-Dong
Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
在期刊界中查找
在百度中查找
在本站中查找
CHEN Zheng
CHEN Zheng
Key Laboratory of Data Engineering and Knowledge Engineering(Renmin University of China), Ministry of Education, Beijing 100872, China;School of Information, Renmin University of China, Beijing 100872, China
在期刊界中查找
在百度中查找
在本站中查找
LIN Jia-Zao
LIN Jia-Zao
Department of Information Management, Peking University, Beijing 100871, China
在期刊界中查找
在百度中查找
在本站中查找
DU Xiao-Yong
DU Xiao-Yong
Key Laboratory of Data Engineering and Knowledge Engineering(Renmin University of China), Ministry of Education, Beijing 100872, China;School of Information, Renmin University of China, Beijing 100872, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

National Key Research and Development Program of China (2016YFB0200100); National Natural Science Foundation of China (61732014, 61722208, 61802412)

摘要

图/表

访问统计

参考文献 [48]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

随着异构计算技术的不断进步，CPU和GPU等设备相集成的异构融合处理器在近些年得到了充分的发展，并引起了学术界和工业界的关注.将多种设备进行集成带来了许多好处，例如，多种设备可以访问同样的内存，可以进行细粒度的交互.然而，这也带来了系统编程和优化方面的巨大挑战.充分发挥异构融合处理器的性能，需要充分利用集成体系结构中共享内存等特性；同时，还需结合具体应用特征对异构融合处理器上的不同设备进行优化.首先对目前涉及异构融合处理器的研究工作进行了分析，之后介绍了异构融合处理器的性能分析工作，并进一步介绍了相关优化技术，随后对异构融合处理器的应用进行了总结.最后，对异构融合处理器未来的研究方向进行展望，并进行了总结.

关键词:CPU;GPU;异构融合处理器;性能分析;性能优化

Abstract:

With the development of heterogeneous computing technology, heterogeneous fusion processors, such as CPU-GPU integrated processors, have been fully developed in recent years, and arouse attention from both academia and industry. The fusion of different devices has several advantages. For example, all devices share the same memory and can have fine-grained cooperation. However, many system programming challenges and optimization challenges have emerged. To take full advantage of the capacity of heterogeneous fusion processors, it is needed to utilize features of heterogeneous fusion processors such as shared memory, and to perform architecture optimizations to different devices according to different applications. The research work related to heterogeneous fusion processors is first analyzed and summarized. Second, the related work about performance analysis is introduced. Third, the optimizations on heterogeneous fusion processors are summarized. A summarization for the applications that utilize heterogeneous fusion processors is also provided. At last, the future directions are provided on heterogeneous fusion processors and conclusion is given.

Key words:CPU;GPU;heterogeneous fusion processors;performance analysis;performance optimization

参考文献

[1] Foley D, Steinman M, Branover A, Smaus G, Asaro A, Punyamurtula S, Bajic L. AMD's ‘Llano’ fusion APU. In:Proc. of the Hot Chips, Vol.23. 2011. 1-38.

[2] Intel. The compute architecture of Intel processor graphics Gen7.5. 2017. https://software.intel.com/sites/default/files/managed

[3] Nikolskiy VP, Stegailov VV, Vecher VS. Efficiency of the Tegra K1 and X1 systems-on-chip for classical molecular dynamics. In:Proc. of the 2016 Int'l Conf. on High Performance Computing & Simulation (HPCS). Innsbruck, 2016. 682-689.

[4] Vijayaraghavany T, Eckert Y, Loh GH, et al. Design and analysis of an APU for exascale computing. In:Proc. of the 2017 IEEE Int'l Symp. on High Performance Computer Architecture (HPCA). IEEE, 2017. 85-96.

[5] Schulte MJ, Ignatowski M, Loh GH, et al. Achieving exascale capabilities through heterogeneous computing. IEEE Micro, 2015,35(4):26-36.

[6] Colangelo P, Luebbers E, Huang R, et al. Application of convolutional neural networks on Intel Xeon processor with integrated FPGA. In:Proc. of the 2017 IEEE High Performance Extreme Computing Conf. (HPEC). IEEE, 2017. 1-7.

[7] Zhang F. Research on workload analysis and optimizations on heterogeneous integrated architectures[Ph.D. Thesis]. Beijing:Tsinghua University, 2017(in Chinese with English abstract).

[8] Zhang F, Zhai J, Chen W, He B, Zhang S. To co-run, or not to co-run:A performance study on integrated architectures. In:Proc. of the 2015 IEEE 23rd Int'l Symp. on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS). IEEE, 2015. 89-92.

[9] Zhu Q, Wu B, Shen X, Shen L, Wang Z. Understanding co-run degradations on integrated heterogeneous processors. In:Proc. of the Int'l Workshop on Languages and Compilers for Parallel Computing. Cham:Springer-Verlag, 2014. 82-97.

[10] Zhu Q, Wu B, Shen X, Shen K, Shen L, Wang Z. Understanding co-run performance on CPU-GPU integrated processors:Observations, insights, directions. Frontiers of Computer Science, 2017,11(1):130-146.

[11] Zhang F, Zhai J, He B, Zhang S, Chen W. Understanding co-running behaviors on integrated CPU/GPU architectures. IEEE Trans. on Parallel and Distributed Systems, 2017,28(3):905-918.

[12] Pandit P, Govindarajan R. Fluidic kernels:Cooperative execution of OpenCL programs on multiple heterogeneous devices. In:Proc. of the Annual IEEE/ACM Int'l Symp. on Code Generation and Optimization. ACM, 2014.

[13] Stone JE, Gohara D, Shi G. OpenCL:A parallel programming standard for heterogeneous computing systems. Computing in Science & Engineering, 2010,12(3):66-73.

[14] Zhu Q, Wu B, Shen X, Shen L, Wang Z. Co-run scheduling with power cap on integrated CPU-GPU systems. In:Proc. of the 2017 IEEE Int'l Parallel and Distributed Processing Symp. (IPDPS). IEEE, 2017. 967-977.

[15] Garzón EM, Moreno JJ, Martínez JA. An approach to optimise the energy efficiency of iterative computation on integrated GPU-CPU systems. The Journal of Supercomputing, 2017,73(1):114-125.

[16] Sanders J, Kandrot E. CUDA by Example:An Introduction to General-purpose GPU Programming. Addison-Wesley Professional, 2010.

[17] Krishnan G, Bouvier D, Naffziger S. Energy-efficient graphics and multimedia in 28-nm Carrizo accelerated processing unit. IEEE Micro, 2016,36(2):22-33.

[18] Doweck J, Kao WF, Lu AKY, et al. Inside 6th-generation Intel core:New microarchitecture code-named Skylake. IEEE Micro, 2017,37(2):52-62.

[19] Boggs D, Brown G, Tuck N, Venkatraman KS. Denver:Nvidia's first 64-bit ARM processor. IEEE Micro, 2015,35(2):46-55.

[20] Lee K, Lin H, Feng WC. Performance characterization of data-intensive kernels on AMD fusion architectures. Computer Science-Research and Development, 2013,28(2-3):175-184.

[21] Dashti M, Fedorova A. Analyzing memory management methods on integrated CPU-GPU systems. ACM SIGPLAN Notices, 2017, 52(9):59-69.

[22] Yang Y, Xiang P, Mantor M, Zhou H. CPU-assisted GPGPU on fused CPU-GPU architectures. In:Proc. of the 2012 IEEE 18th Int'l Symp. on High Performance Computer Architecture (HPCA). IEEE, 2012. 1-12.

[23] Power J, Basu A, Gu J, Puthoor S, Beckmann BM, Hill MD, Reinhardt SK, Wood DA. Heterogeneous system coherence for integrated CPU-GPU systems. In:Proc. of the 46th Annual IEEE/ACM Int'l Symp. on Microarchitecture. ACM, 2013. 457-467.

[24] Agarwal N, Nellans D, Ebrahimi E, Wenisch TF, Danskin J, Keckler SW. Selective GPU caches to eliminate CPU-GPU HW cache coherence. In:Proc. of the 2016 IEEE Int'l Symp. on High Performance Computer Architecture (HPCA). IEEE, 2016. 494-506.

[25] Choi YK, Cong J, Fang Z, Hao Y, Reinman G, Wei P. A quantitative analysis on microarchitectures of modern CPU-FPGA platforms. In:Proc. of the 53rd Annual Design Automation Conf. ACM, 2016.

[26] Cong J, Fang Z, Huang M, Wang L, Wu D. CPU-FPGA co-scheduling for big data applications. IEEE Design & Test, 2018,35(1):16-22.

[27] Nichols B, Buttlar D, Farrell J. Pthreads Programming:A POSIX Standard for Better Multiprocessing. O'Reilly Media, Inc., 1996.

[28] 28Dagum L, Menon R. OpenMP:An industry standard API for shared-memory programming. IEEE Computational Science and Engineering, 1998,5(1):46-55.

[29] Daga M, Aji AM, Feng WC. On the efficacy of a fused CPU+ GPU processor (or APU) for parallel computing. In:Proc. of the 2011 Symp. on Application Accelerators in High-performance Computing (SAAHPC). IEEE, 2011. 141-149.

[30] Spafford KL, Meredith JS, Lee S, Li D, Roth PC, Vetter JS. The tradeoffs of fused memory hierarchies in heterogeneous computing architectures. In:Proc. of the 9th Conf. on Computing Frontiers. ACM, 2012. 103-112.

[31] Zakharenko V, Aamodt T, Moshovos A. Characterizing the performance benefits of fused CPU/GPU systems using FusionSim. In:Proc. of the Design, Automation & Test in Europe Conf. & Exhibition (DATE). IEEE, 2013. 685-688.

[32] Zhang F, Wu B, Zhai J, He B, Chen W. FinePar:Irregularity-aware fine-grained workload partitioning on integrated architectures. In:Proc. of the 2017 IEEE/ACM Int'l Symp. on Code Generation and Optimization (CGO). IEEE, 2017. 27-38.

[33] Zhang F, Liu W, Feng N, et al. Performance evaluation and analysis of sparse matrix and graph kernels on heterogeneous processors. CCF Trans. on High Performance Computing, 2019,1(2):131-143.

[34] Mekkat V, Holey A, Yew PC, Zhai A. Managing shared last-level cache in a heterogeneous multicore processor. In:Proc. of the 22nd Int'l Conf. on Parallel Architectures and Compilation Techniques. IEEE, 2013. 225-234.

[35] Said I, Fortin P, Lamotte JL, et al. Leveraging the accelerated processing units for seismic imaging:A performance and power efficiency comparison against CPUs and GPUs. The Int'l Journal of High Performance Computing Applications, 2018,32(6):819-837.

[36] Dávila GP, Oliveira D, Navaux P, et al. Impact of workload distribution on energy consumption, performance, and reliability of heterogeneous devices. In:Proc. of the 201927th Euromicro Int'l Conf. on Parallel, Distributed and Network-based Processing (PDP). IEEE, 2019. 166-173.

[37] Dávila GP. A performance, energy consumption and reliability evaluation of workload distribution on heterogeneous devices. 2019. https://www.lume.ufrgs.br/handle/10183/198499

[38] Barik R, Kaleem R, Majeti D, Lewis BT, Shpeisman T, Hu C, Ni Y, Adl-Tabatabai AR. Efficient mapping of irregular C++ applications to integrated GPUs. In:Proc. of the Annual IEEE/ACM Int'l Symp. on Code Generation and Optimization. ACM, 2014.

[39] Kaleem R, Barik R, Shpeisman T, Lewis BT, Hu C, Pingali K. Adaptive heterogeneous scheduling for integrated GPUs. In:Proc. of the 23rd Int'l Conf. on Parallel Architectures and Compilation. ACM, 2014. 151-162.

[40] Tang S, He B, Zhang S, Niu Z. Elastic multi-resource fairness:balancing fairness and efficiency in coupled CPU-GPU architectures. In:Proc. of the Int'l Conf. for High Performance Computing, Networking, Storage and Analysis. IEEE, 2016.

[41] Puthoor S, Aji AM, Che S, Daga M, Wu W, Beckmann 求椀挀愀琀椀漀渀?漀渀?栀攀琀攀爀漀最攀渀攀漀甀猀?瀀爀漀挀攀猀猀漀爀猀??倀愀爀愀氀氀攀氀??漀洀瀀甀琀椀渀最???　???????????????戀爀?嬀??崀??椀甀?圀??嘀椀渀琀攀爀??????昀爀愀洀攀眀漀爀欀?昀漀爀?最攀渀攀爀愀氀?猀瀀愀爀猀攀?洀愀琀爀椀砀?洀愀琀爀椀砀?洀甀氀琀椀瀀氀椀挀愀琀椀漀渀?漀渀??倀唀猀?愀渀搀?栀攀琀攀爀漀最攀渀攀漀甀猀?瀀爀漀挀攀猀猀漀爀猀???漀甀爀渀愀氀?漀昀?倀愀爀愀氀氀攀氀?愀渀搀??椀猀琀爀椀戀甀琀攀搀??漀洀瀀甀琀椀渀最???　?????????????戀爀?嬀??崀?娀栀愀渀最?????椀渀????娀栀愀椀????攀琀?愀氀???渀?愀搀愀瀀琀椀瘀攀?戀爀攀愀搀琀栀?昀椀爀猀琀?猀攀愀爀挀栀?愀氀最漀爀椀琀栀洀?漀渀?椀渀琀攀最爀愀琀攀搀?愀爀挀栀椀琀攀挀琀甀爀攀猀??吀栀攀??漀甀爀渀愀氀?漀昀?匀甀瀀攀爀挀漀洀瀀甀琀椀渀最???　?????????????????????戀爀?嬀??崀?娀漀甀????吀愀渀最?匀??夀甀????攀琀?愀氀???匀圀??挀挀攀氀攀爀愀琀椀渀最?匀洀椀琀栀?圀愀琀攀爀洀愀渀?愀氀最漀爀椀琀栀洀?漀渀?挀漀甀瀀氀攀搀??倀唀??倀唀?愀爀挀栀椀琀攀挀琀甀爀攀???渀琀?氀??漀甀爀渀愀氀?漀昀?倀愀爀愀氀氀攀氀?倀爀漀最爀愀洀洀椀渀最???　??????????????　???戀爀?嬀??崀??爀攀礀琀愀最????一愀瘀愀甀砀?倀伀????椀洀愀??嘀???攀琀?愀氀??一漀渀?甀渀椀昀漀爀洀?搀漀洀愀椀渀?搀攀挀漀洀瀀漀猀椀琀椀漀渀?昀漀爀?栀攀琀攀爀漀最攀渀攀漀甀猀?愀挀挀攀氀攀爀愀琀攀搀?瀀爀漀挀攀猀猀椀渀最?甀渀椀琀猀???渀?倀爀漀挀??漀昀?琀栀攀??渀琀?氀??漀渀昀??漀渀?嘀攀挀琀漀爀?愀渀搀?倀愀爀愀氀氀攀氀?倀爀漀挀攀猀猀椀渀最???栀愀洀?匀瀀爀椀渀最攀爀???　?????　???????戀爀?嬀??崀??漀?夀???愀洀猀栀攀搀??????漀漀渀?夀???眀愀渀最????倀愀爀欀?????倀唀一攀琀?刀攀瘀椀琀愀氀椀稀椀渀最??倀唀?愀猀?瀀愀挀欀攀琀?瀀爀漀挀攀猀猀椀渀最?愀挀挀攀氀攀爀愀琀漀爀???渀?倀爀漀挀??漀昀?琀栀攀?一匀?????　???????????戀爀?嬀??崀??栀愀渀最?夀????栀椀?吀夀???愀猀栀?戀愀猀攀搀?伀瀀攀渀?氀漀眀?瀀愀挀欀攀琀?挀氀愀猀猀椀昀椀挀愀琀椀漀渀?漀渀?栀攀琀攀爀漀最攀渀攀漀甀猀?猀礀猀琀攀洀?愀爀挀栀椀琀攀挀琀甀爀攀???渀?倀爀漀挀??漀昀?琀栀攀??　????琀栀??渀琀?氀??漀渀昀??漀渀?唀戀椀焀甀椀琀漀甀猀?愀渀搀??甀琀甀爀攀?一攀琀眀漀爀欀猀????唀?一??????????　?????　　??　???戀爀?嬀??崀?娀栀甀????圀愀渀最????娀栀愀渀最?倀??攀琀?愀氀??倀愀爀愀氀氀攀氀?椀洀瀀氀攀洀攀渀琀愀琀椀漀渀猀?漀昀?昀爀愀洀攀?爀愀琀攀?甀瀀?挀漀渀瘀攀爀猀椀漀渀?愀氀最漀爀椀琀栀洀?甀猀椀渀最?伀瀀攀渀???漀渀?栀攀琀攀爀漀最攀渀攀漀甀猀?挀漀洀瀀甀琀椀渀最?搀攀瘀椀挀攀猀???甀氀琀椀洀攀搀椀愀?吀漀漀氀猀?愀渀搀??瀀瀀氀椀挀愀琀椀漀渀猀???　????????????????????戀爀?嬀??崀??栀攀?匀???漀礀攀爀?????攀渀最????吀愀爀樀愀渀????匀栀攀愀昀昀攀爀??圀???攀攀?匀???匀欀愀搀爀漀渀????刀漀搀椀渀椀愀???戀攀渀挀栀洀愀爀欀?猀甀椀琀攀?昀漀爀?栀攀琀攀爀漀最攀渀攀漀甀猀?挀漀洀瀀甀琀椀渀最???渀?倀爀漀挀??漀昀?琀栀攀???????渀琀?氀?匀礀洀瀀??漀渀?圀漀爀欀氀漀愀搀??栀愀爀愀挀琀攀爀椀稀愀琀椀漀渀????匀圀???　　???????????　　??????????戀爀?嬀??崀?圀椀欀椀瀀攀搀椀愀???渀琀攀氀?最爀愀瀀栀椀挀猀?琀攀挀栀渀漀氀漀最礀???　?　??栀琀琀瀀猀???攀渀?眀椀欀椀瀀攀搀椀愀?漀爀最?眀椀欀椀??渀琀攀氀开?爀愀瀀栀椀挀猀开吀攀挀栀渀漀氀漀最礀?戀爀?嬀?　崀??攀愀渀?????栀攀洀愀眀愀琀?匀???愀瀀刀攀搀甀挀攀?匀椀洀瀀氀椀昀椀攀搀?搀愀琀愀?瀀爀漀挀攀猀猀椀渀最?漀渀?氀愀爀最攀?挀氀甀猀琀攀爀猀???漀洀洀甀渀椀挀愀琀椀漀渀猀?漀昀?琀栀攀???????　　?????????　???????戀爀?嬀??崀??瀀愀挀栀攀??愀栀漀甀琀??吀栀攀??瀀愀挀栀攀??愀栀漀甀琀?倀爀漀樀攀挀琀???　????栀琀琀瀀???洀愀栀漀甀琀?愀瀀愀挀栀攀?漀爀最??戀爀?嬀??崀??椀甀?圀???嘀椀渀琀攀爀?????渀?攀昀昀椀挀椀攀渀琀??倀唀?最攀渀攀爀愀氀?猀瀀愀爀猀攀?洀愀琀爀椀砀?洀愀琀爀椀砀?洀甀氀琀椀瀀氀椀挀愀琀椀漀渀?昀漀爀?椀爀爀攀最甀氀愀爀?搀愀琀愀???渀?倀爀漀挀??漀昀?琀栀攀??　??????????琀栀??渀琀?氀?倀愀爀愀氀氀攀氀?愀渀搀??椀猀琀爀椀戀甀琀攀搀?倀爀漀挀攀猀猀椀渀最?匀礀洀瀀?????????　??????　??????戀爀?嬀??崀?倀愀渀?圀???椀?娀??娀栀愀渀最?夀??攀琀?愀氀??吀栀攀?渀攀眀?栀愀爀搀眀愀爀攀?搀攀瘀攀氀漀瀀洀攀渀琀?琀爀攀渀搀?愀渀搀?琀栀攀?挀栀愀氀氀攀渀最攀猀?椀渀?搀愀琀愀?洀愀渀愀最攀洀攀渀琀?愀渀搀?愀渀愀氀礀猀椀猀???愀琀愀?匀挀椀攀渀挀攀?愀渀搀??渀最椀渀攀攀爀椀渀最???　?????????????????戀爀?嬀??崀??椀渀?堀???夀甀??堀??匀瀀攀挀椀愀氀?椀猀猀甀攀?漀渀?最爀愀瀀栀?瀀爀漀挀攀猀猀椀渀最?吀攀挀栀渀椀焀甀攀猀?愀渀搀?愀瀀瀀氀椀挀愀琀椀漀渀猀???愀琀愀?匀挀椀攀渀挀攀?愀渀搀??渀最椀渀攀攀爀椀渀最???　???????嬀搀漀椀??　???　　??猀??　???　???　　????崀?戀爀?嬀??崀??栀攀渀最?夀???椀渀最?倀??圀愀渀最?吀??攀琀?愀氀??圀栀椀挀栀?挀愀琀攀最漀爀礀?椀猀?戀攀琀琀攀爀??攀渀挀栀洀愀爀欀椀渀最?爀攀氀愀琀椀漀渀愀氀?愀渀搀?最爀愀瀀栀?搀愀琀愀戀愀猀攀?洀愀渀愀最攀洀攀渀琀?猀礀猀琀攀洀猀???愀琀愀?匀挀椀攀渀挀攀?愀渀搀??渀最椀渀攀攀爀椀渀最???　?????????　???????戀爀???蝎??螀???戀爀?嬀?崀????戀????葟獧?葓?綍?遒???獓?肕??癸孺娀?晘??螋嵥?????乮?晙??　????tures. In:Proc. of the USENIX Annual Technical Conf. (USENIX ATC). 2020.

[54] Doerksen M, Solomon S, Thulasiraman P. Designing APU oriented scientific computing applications in OpenCL. In:Proc. of the 2011 IEEE 13th Int'l Conf. on High Performance Computing and Communications (HPCC). IEEE, 2011. 587-592.

[55] Ilgner RG, Davidson DB. A comparison of the FDTD algorithm implemented on an integrated GPU versus a GPU configured as a co-processor. In:Proc. of the 2012 Int'l Conf. on Electromagnetics in Advanced Applications (ICEAA). IEEE, 2012. 1046-1049.

[56] Delorme MC, Abdelrahman TS, Zhao C. Parallel radix sort on the AMD fusion accelerated processing unit. In:Proc. of the 201342nd Int'l Conf. on Parallel Processing (ICPP). IEEE, 2013. 339-348.

[57] Liu WF, Vinter B. Ad-Heap:An efficient heap data structure for asymmetric multicore processors. In:Proc. of the Workshop on General Purpose Processing Using GPUs. ACM, 2014.

[58] Daga M, Nutter M, Meswani M. Efficient breadth-first search on a heterogeneous processor. In:Proc. of the 2014 IEEE Int'l Conf. on Big Data (Big Data). IEEE, 2014. 373-382.

[59] Eberhart P, Said I, Fortin P, Calandra H. Hybrid strategy for stencil computations on the APU. In:Proc. of the 1st Int'l Workshop on High-performance Stencil Computations. Vienna, 2014. 43-49.

[60] Liu WF, Vinter B. Speculative segmented sum for sparse matrix-vector multip??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

引用本文

张峰,翟季冬,陈政,林甲灶,杜小勇.面向异构融合处理器的性能分析、优化及应用综述.软件学报,2020,31(8):2603-2624

复制

文章指标

点击次数:3849
下载次数: 7334
HTML阅读次数: 5405
引用次数: 0

历史

收稿日期:2019-01-31
最后修改日期:2020-04-09
录用日期:
在线发布日期: 2020-05-26
出版日期: 2020-08-06

微信服务号

微信订阅号

引用本文

相关视频

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

相关视频

分享

微信扫一扫：分享

文章指标

历史

文章二维码