基于异构多核心GPU的高性能密码计算技术研究进展
作者:
作者简介:

董建阔(1992-), 男, 博士, 讲师, CCF专业会员, 主要研究领域为公钥密码, 后量子密码, 并行计算. ;黄跃花(1999-), 女, 硕士生, 主要研究领域为应用密码学, 并行计算. ;付宇笙(1999-), 男, 硕士生, CCF学生会员, 主要研究领域为公钥密码, 后量子密码, 并行计算. ;肖甫(1980-), 男, 博士, 教授, 博士生导师, CCF高级会员, 主要研究领域为网络空间安全, 物联网技术. ;郑昉昱(1988-), 男, 博士, 助理研究员, CCF学生会员, 主要研究领域为应用密码学, 公钥密码, 并行计算. ;林璟锵(1979-), 男, 博士, 教授, 博士生导师, 主要研究领域为密码工程, 系统安全. ;董振江(1970-), 男, 博士, 教授, 博士生导师, 主要研究领域为网络空间安全, 人工智能.

通讯作者:

肖甫, E-mail: xiaof@njupt.edu.cn

基金项目:

江苏省重点研发计划(BE2022798); 国家自然科学基金(62302238); 江苏省自然科学基金(BK20220388); 江苏省高等学校基础科学(自然科学)研究面上项目(22KJB520004); 中国博士后科学基金(2022M711689); 公安部技术研究计划(201JSYJD03)


Research Progress in High-performance Cryptographic Computing Technology Based on Heterogeneous Multicore GPUs
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [165]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    密码学是保障网络安全的核心基础, 其在数据保护、身份验证、加密通信等方面发挥着至关重要的作用. 随着5G和物联网技术的迅速普及, 网络安全面临着空前的挑战, 对密码学性能的要求呈现出爆发式增长. GPU能够利用数以千计的计算核心并行化加速复杂计算问题, 这种并行化特性非常适用于密码学算法的计算密集型特性. 鉴于此, 研究人员广泛探索了在GPU平台上加速各种密码算法的方法, 与CPU、FPGA等平台相比, GPU展现出明显的性能优势. 论述各类密码算法的分类与GPU平台架构, 对各类密码在GPU异构平台上的研究现状进行详细分析, 总结当前基于GPU平台高性能密码学面临的技术难题, 并对未来技术发展进行展望. 通过深入研究和总结, 旨在为密码工程研究从业者提供有关基于GPU的高性能密码计算的最新研究进展和应用实践的综合参考.

    Abstract:

    As the core foundation for ensuring network security, cryptography plays a crucial role in data protection, identity verification, encrypted communication, and other aspects. With the rapid popularization of 5G and the Internet of Things technology, network security is facing unprecedented challenges, and the demand for cryptographic performance is showing explosive growth. GPU can utilize thousands of parallel computing cores to accelerate complex computing problems, which is very suitable for the computationally intensive nature of cryptographic algorithms. Therefore, researchers have extensively explored methods to accelerate various cryptographic algorithms on GPU platforms. Compared with platforms such as CPU and FPGA, GPU has significant performance advantages. This study discusses the classification of various cryptographic algorithms and GPU platform architecture, and provides a detailed analysis of current research on various ciphers on GPU heterogeneous platforms. Additionally, it summarizes the current technical challenges confronted by high-performance cryptography based on GPU platforms and provides prospects for future technological development. Finally, comprehensive references can be provided for practitioners in cryptography engineering research on the latest research progress and application practices of high-performance cryptography based on GPU by in-depth studies and summaries.

    参考文献
    [1] Rescorla E. HTTP over TLS. 2000. https://www.rfc-editor.org/info/rfc2818
    [2] Freier A, Karlton P, Kocher P. The secure sockets layer (SSL) protocol version 3.0. 2011. https://www.rfc-editor.org/info/rfc6101
    [3] Dierks T, Rescorla E. The transport layer security (TLS) protocol version 1.2. 2008. https://www.rfc-editor.org/info/rfc5246
    [4] 国家密码管理局. 国务院常务会议审议通过《商用密码管理条例(修订草案)》. 2023. https://www.oscca.gov.cn/sca/xwdt/2023-04/20/content_1061005.shtml
    National Cryptography Administration. The executive meeting of the state council deliberated and adopted the regulations on the administration of commercial passwords (draft revision). 2023 (in Chinese). https://www.oscca.gov.cn/sca/xwdt/2023-04/20/content_1061005.shtml
    [5] Adams C, Lloyd S. Understanding Public-key Infrastructure: Concepts, Standards, and Deployment Considerations. Indianapolis: Macmillan Technical Publishing, 1999.
    [6] NVIDIA. CUDA C++ programming guide 9.0. 2017. https://docs.nvidia.com/cuda/cuda-c-programming-guide/
    [7] Shannon CE. Communication theory of secrecy systems. The Bell System Technical Journal, 1949, 28(4): 656–715.
    [8] Diffie W, Hellman ME. New directions in cryptography. In: Slayton R, ed. Democratizing Cryptography: The Work of Whitfield Diffie and Martin Hellman. New York: Association for Computing Machinery, 2022. 365–390. [doi: 10.1145/3549993.3550007]
    [9] Coppersmith D. The data encryption standard (DES) and its strength against attacks. IBM Journal of Research and Development, 1994, 38(3): 243–250.
    [10] Nechvatal J, Barker E, Bassham L, Burr W, Dworkin M, Foti J, Roback E. Report on the development of the advanced encryption standard (AES). Journal of Research of the National Institute of Standards and Technology, 2001, 106(3): 511–576.
    [11] 中华人民共和国国家质量监督检验检疫总局, 中国国家标准化管理委员会. GB/T 32907-2016 信息安全技术 SM4分组密码算法. 北京: 中国标准出版社, 2017.
    General Administration of Quality Supervision, Inspection and Quarantine of the People’s Republic of China, Standardization Administration of the People’s Republic of China. GB/T 32907-2016 Information security technology—SM4 block cipher algorithm. Beijing: Standards Press of China, 2017 (in Chinese).
    [12] Fluhrer S, Mantin I, Shamir A. Weaknesses in the key scheduling algorithm of RC4. In: Proc. of the 8th Int’l Workshop on Selected Areas in Cryptography. Toronto: Springer, 2001. 1–24. [doi: 10.1007/3-540-45537-X_1]
    [13] 中华人民共和国国家质量监督检验检疫总局, 中国国家标准化管理委员会. GB/T 33133.1-2016 信息安全技术 祖冲之序列密码算法 第1部分: 算法描述. 北京: 中国标准出版社, 2016.
    General Administration of Quality Supervision, Inspection and Quarantine of the People’s Republic of China, Standardization Administration of the People’s Republic of China. GB/T 33133.1-2016 Information security technology—ZUC stream cipher algorithm—Part 1: Algorithm description. Beijing: Standards Press of China, 2016 (in Chinese).
    [14] Merkle RC. Secrecy, Authentication, and Public Key Systems. Stanford: Stanford University, 1979.
    [15] Rivest RL. The MD4 message digest algorithm. In: Proc. of the 10th Annual Int’l Cryptology Conf. on Advances in Cryptology. Santa Barbara: Berlin, 1990. 303–311. [doi: 10.1007/3-540-38424-3_22]
    [16] Rivest R. The MD5 message-digest algorithm. 1992. https://www.rfc-editor.org/info/rfc1321
    [17] Krawczyk H, Bellare M, Canetti R. HMAC: Keyed-hashing for message authentication. 1997. https://www.rfc-editor.org/info/rfc2104
    [18] Dobbertin H, Bosselaers A, Preneel B. RIPEMD-160: A strengthened version of RIPEMD. In: Proc. of the 3rd Int’l Workshop on Fast Software Encryption. Cambridge: Springer, 1996. 71–82. [doi: 10.1007/3-540-60865-6_44]
    [19] Barreto P, Rijmen V. The Whirlpool hashing function. 2000. https://www.researchgate.net/publication/228610491_The_Whirlpool_hashing_function
    [20] Rivest RL, Shamir A, Adleman L. A method for obtaining digital signatures and public-key cryptosystems. Communications of the ACM, 1978, 21(2): 120–126.
    [21] Rabin MO. Digitalized Signatures and Public-key Functions as Intractable as Factorization. Cambridge: Massachusetts Institute of Technology, 1979.
    [22] 张雁, 林英, 郝林. 椭圆曲线公钥密码体制的研究热点综述. 计算机工程, 2004, 30(3): 127–129.
    Zhang Y, Lin Y, Hao L. Summarize of elliptic curve cryptosystem research. Computer Engineering, 2004, 30(3): 127–129 (in Chinese with English abstract).
    [23] ElGamal T. A public key cryptosystem and a signature scheme based on discrete logarithms. IEEE Trans. on Information Theory, 1985, 31(4): 469–472.
    [24] Kerry CF, Gallagher PD. FIPS 186-4 Digital signature standard (DSS). 2013. https://www.docin.com/p-928808588.html
    [25] Lim CH, Lee PJ. The Korean certificate-based digital signature algorithm. Computers & Electrical Engineering, 1999, 25(4): 249–265.
    [26] Bernstein DJ. Curve25519: New Diffie-Hellman speed records. In: Proc. of the 9th Int’l Workshop on Public Key Cryptography. New York: Springer, 2006. 207–228. [doi: 10.1007/11745853_14]
    [27] 国家密码管理局. 国家密码管理局关于发布《SM2椭圆曲线公钥密码算法》公告. 2010. https://oscca.gov.cn/sca/xxgk/2010-12/17/content_1002386.shtml
    National Cryptography Administration. Announcement of the National Cryptography Administration on the release of SM2 elliptic curve public key cryptography algorithm. 2010 (in Chinese). https://oscca.gov.cn/sca/xxgk/2010-12/17/content_1002386.shtml
    [28] Shor PW. Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer. SIAM Review, 1999, 41(2): 303–332.
    [29] Grover LK. Quantum computers can search rapidly by using almost any transformation. Physical Review Letters, 1998, 80(19): 4329–4332.
    [30] Schwabe P, Avanzi R. Pqcrystals. 2017. https://pq-crystals.org/
    [31] Lyubashevsky V, Ducas L. Pqcrystals. 2017. https://pq-crystals.org/
    [32] Prest T, Fouque PA. Falcon-sign. 2017. https://falcon-sign.info/
    [33] Hülsing A, Bernstein DJ. Sphincs. 2015. https://sphincs.org/
    [34] Rivest RL, Adleman L, Dertouzos ML. On data banks and privacy homomorphisms. Foundations of Secure Computation, 1978, 4(11): 169–180.
    [35] Gentry C. Fully homomorphic encryption using ideal lattices. In: Proc. of the 41st Annual ACM Symp. on Theory of Computing. Bethesda: Association for Computing Machinery, 2009. 169–178. [doi: 10.1145/1536414.1536440]
    [36] Gentry C, Halevi S. Implementing gentry’s fully-homomorphic encryption scheme. In: Proc. of the 30th Annual Int’l Conf. on the Theory and Applications of Cryptographic Techniques. Tallinn: Springer, 2011. 129–148. [doi: 10.1007/978-3-642-20465-4_9]
    [37] Brakerski Z, Gentry C, Vaikuntanathan V. (Leveled) fully homomorphic encryption without bootstrapping. ACM Trans. on Computation Theory, 2014, 6(3): 13.
    [38] Fan JF, Vercauteren F. Somewhat practical fully homomorphic encryption. 2012. https://eprint.iacr.org/2012/144
    [39] Cheon JH, Kim A, Kim M, Song Y. Homomorphic encryption for arithmetic of approximate numbers. In: Proc. of the 23rd Int’l Conf. on the Theory and Application of Cryptology and Information Security. Hong Kong: Springer, 2017. 409–437. [doi: 10.1007/978-3-319-70694-8_15]
    [40] Microsoft. Microsoft SEAL Release 3.7.2. 2021. https://github.com/Microsoft/SEAL/releases/tag/v3.7.2
    [41] Homomorphic encryption library (HElib) community. HElib. 2021. https://github.com/homenc/HElib
    [42] Sidorov V, Wei EYF, Ng WK. Comprehensive performance analysis of homomorphic cryptosystems for practical data processing. arXiv:2202.02960, 2022.
    [43] Owens JD, Houston M, Luebke D, Green S, Stone JE, Phillips JC. GPU computing. Proc. of the IEEE, 2008, 96(5): 879–899.
    [44] NVIDIA. NVIDIA developer. 2023. https://developer.nvidia.com/
    [45] Intel. Intel® products. 2023. https://www.intel.com/content/www/us/en/products/overview.html
    [46] Gilger J, Barnickel J, Meyer U. GPU-acceleration of block ciphers in the OpenSSL cryptographic library. In: Proc. of the 15th Int’l Conf. on Information Security. Passau: Springer, 2012. 338–353. [doi: 10.1007/978-3-642-33383-5_21]
    [47] Lee WK, Goi BM, Phan RCW, Poh GS. High speed implementation of symmetric block cipher on GPU. In: Proc. of the 2014 Int’l Symp. on Intelligent Signal Processing and Communication Systems (ISPACS). Kuching: IEEE, 2014. 102–107.
    [48] Lee WK, Cheong HS, Phan RCW, Goi BM. Fast implementation of block ciphers and PRNG in Maxwell GPU architecture. Cluster Computing, 2016, 19(1): 335–347.
    [49] Abdelrahman AA, Fouad MM, Dahshan H, Mousa AM. High performance CUDA AES implementation: A quantitative performance analysis approach. In: Proc. of the 2017 Computing Conf. London: IEEE, 2017. 1077–1085. [doi: 10.1109/SAI.2017.8252225]
    [50] Nishikawa N, Amano H, Iwai K. Implementation of bitsliced AES encryption on CUDA-enabled GPU. In: Proc. of the 11th Int’l Conf. on Network and System Security. Helsinki: Springer, 2017. 273–287. [doi: 10.1007/978-3-319-64701-2_20]
    [51] Cheng WZ, Zheng FY, Pan WQ, Lin JQ, Li HR, Li BY. High-performance symmetric cryptography server with GPU acceleration. In: Proc. of the 19th Int’l Conf. on Information and Communications Security. Beijing: Springer, 2018. 529–540. [doi: 10.1007/978-3-319-89500-0_46]
    [52] Hajihassani O, Monfared SK, Khasteh SH, Gorgin S. Fast AES implementation: A high-throughput bitsliced approach. IEEE Trans. on Parallel and Distributed Systems, 2019, 30(10): 2211–2222.
    [53] Chen ZW, Chen JG, Meng WZ, The JS, Li P, Ren BQ. Analysis of differential distribution of lightweight block cipher based on parallel processing on GPU. Journal of Information Security and Applications, 2020, 55: 102565.
    [54] Fu XL, Di XQ, Lu HM. Parallel and high-speed implementation of SM4 encryption algorithm on OpenCL. In: Proc. of the 2021 Int’l Conf. on Frontiers of Electronics, Information and Computation Technologies. Changsha: ACM, 2021. 94. [doi: 10.1145/3474198.3478218]
    [55] Choi H, Seo SC. Fast implementation of SHA-3 in GPU environment. IEEE Access, 2021, 9: 144574–144586.
    [56] Eum SW, Kim HJ, Kwon HD, Jang KB, Kim HJ, Seo HJ. Implementation of SM4 block cipher on CUDA GPU and its analysis. In: Proc. of the 2022 Int’l Conf. on Platform Technology and Service (PlatCon). Jeju: IEEE, 2022. 71–74.
    [57] Lee WK, Seo HJ, Seo SC, Hwang SO. Efficient implementation of AES-CTR and AES-ECB on GPUs with applications for high-speed FrodoKEM and exhaustive key search. IEEE Trans. on Circuits and Systems II: Express Briefs, 2022, 69(6): 2962–2966.
    [58] Dong JK, Lu S, Zhang PC, Zheng FY, Xiao F. G-SM3: High-performance implementation of GPU-based SM3 hash function. In: Proc. of the 28th IEEE Int’l Conf. on Parallel and Distributed Systems (ICPADS). Nanjing: IEEE, 2023. 201–208.
    [59] Foundation OS. OpenSSL cryptography and SSL/TLS toolkit. 2016. http://www.openssl.org/
    [60] Sun SZ, Zhang R, Ma H. Hashing multiple messages with SM3 on GPU platforms. Science China Information Sciences, 2021, 64(9): 199103.
    [61] Dat TN, Iwai K, Kurokawa T. Implementation of high speed hash function Keccak using CUDA on GTX 1080. In: Proc. of the 5th Int’l Symp. on Computing and Networking (CANDAR). Aomori: IEEE, 2017. 475–481. [doi: 10.1109/CANDAR.2017.47]
    [62] Li J, Xie WB, Li LC, Wu XN. Parallel implementation and optimization of SM4 based on CUDA. In: Proc. of the 1st EAI Int’l Conf. on Applied Cryptography in Computer and Communications. Springer, 2021. 93–104.
    [63] Li CX, Wu HW, Chen SF, Li XC, Guo DH. Efficient implementation for MD5-RC4 encryption using GPU with CUDA. In: Proc. of the 3rd Int’l Conf. on Anti-counterfeiting, Security, and Identification in Communication. Hong Kong: IEEE, 2009. 167–170. [doi: 10.1109/ICASID.2009.5276924]
    [64] Szerwinski R, Güneysu T. Exploiting the power of GPUs for asymmetric cryptography. In: Proc. of the 10th Int’l Workshopon Cryptographic Hardware and Embedded Systems. Washington: Springer, 2008. 79–99. [doi: 10.1007/978-3-540-85053-3_6]
    [65] Bernstein DJ, Chen TR, Cheng CM, Lange T, Yang BY. ECM on graphics cards. In: Proc. of the 28th Annual Int’l Conf. on the Theory and Applications of Cryptographic Techniques. Cologne: Springer, 2009. 483–501. [doi: 10.1007/978-3-642-01001-9_28]
    [66] Cohen AE, Parhi KK. GPU accelerated elliptic curve cryptography in GF(2m). In: Proc. of the 53rd IEEE Int’l Midwest Symp. on Circuits and Systems. Seattle: IEEE, 2010. 57–60. [doi: 10.1109/MWSCAS.2010.5548560]
    [67] Hamburg M. Fast and compact elliptic-curve cryptography. 2012. https://eprint.iacr.org/2012/309.pdf
    [68] Antão S, Bajard JC, Sousa L. RNS-based elliptic curve point multiplication for massive parallel architectures. The Computer Journal, 2012, 55(5): 629–647.
    [69] Pan WQ, Zheng FY, Zhao Y, Zhu WT, Jing JW. An efficient elliptic curve cryptography signature server with GPU acceleration. IEEE Trans. on Information Forensics and Security, 2017, 12(1): 111–122.
    [70] Dong JK, Zheng FY, Emmart N, Lin JQ, Weems C. sDPF-RSA: Utilizing floating-point computing power of GPUs for massive digital signature computations. In: Proc. of the 2018 IEEE Int’l Parallel and Distributed Processing Symp. (IPDPS). Vancouver: IEEE, 2018. 599–609. [doi: 10.1109/IPDPS.2018.00069]
    [71] Dong JK, Zheng FY, Cheng JJ, Lin JQ, Pan WQ, Wang ZY. Towards high-performance X25519/448 key agreement in general purpose GPUs. In: Proc. of the 2018 IEEE Conf. on Communications and Network Security (CNS). Beijing: IEEE, 2018. 1–9.
    [72] Gao LL, Zheng FY, Emmart N, Dong JK, Lin JQ, Weems C. DPF-ECC: Accelerating elliptic curve cryptography with floating-point computing power of GPUs. In: Proc. of the 2020 IEEE Int’l Parallel and Distributed Processing Symp. (IPDPS). New Orleans: IEEE, 2020. 494–504. [doi: 10.1109/IPDPS47924.2020.00058]
    [73] Dong JK, Zheng FY, Lin JQ, Liu Z, Xiao F, Fan G. EC-ECC: Accelerating elliptic curve cryptography for edge computing on embedded GPU TX2. ACM Trans. on Embedded Computing Systems, 2022, 21(2): 16.
    [74] Dong JK, Zhang PC, Sun KS, Xiao F, Zheng FY, Lin JQ. EG-FourQ: An embedded GPU based efficient ECC cryptography accelerator for edge computing. IEEE Trans. on Industrial Informatics, 2023, 19(6): 7291–7300.
    [75] Hu XY, He DB, Luo M, Peng C, Feng Q, Huang XY. High-performance implementation of the identity-based signature scheme in IEEE P1363 on GPU. ACM Trans. on Embedded Computing Systems, 2023, 22(2): 25.
    [76] Emmart N, Weems C. Pushing the performance envelope of modular exponentiation across multiple generations of GPUs. In: Proc. of the 2015 IEEE Int’l Parallel and Distributed Processing Symp. Hyderabad: IEEE, 2015. 166–176. [doi: 10.1109/IPDPS.2015.69]
    [77] Yang Y, Guan Z, Sun HP, Chen Z. Accelerating RSA with fine-grained parallelism using GPU. In: Proc. of the 11th Int’l Conf. on Information Security Practice and Experience. Beijing: Springer, 2015. 454–468. [doi: 10.1007/978-3-319-17533-1_31]
    [78] Microsoft. Microsoft’s FourQ library. 2016. https://github.com/microsoft/FourQlib
    [79] Cui SJ, Großschädl J, Liu Z, Xu QL. High-speed elliptic curve cryptography on the NVIDIA GT200 graphics processing unit. In: Proc. of the 10th Int’l Conf. on Information Security Practice and Experience. Fuzhou: Springer, 2014. 202–216. [doi: 10.1007/978-3-319-06320-1_16]
    [80] Mahé EM, Chauvet JM. Fast GPGPU-based elliptic curve scalar multiplication. 2014. https://eprint.iacr.org/2014/198.pdf
    [81] Zheng FY, Pan WQ, Lin JQ, Jing JW, Zhao Y. Exploiting the potential of GPUs for modular multiplication in ECC. In: Proc. of the 15th Int’l Workshop on Information Security Applications. Jeju Island: Springer, 2014. 295–306. [doi: 10.1007/978-3-319-15087-1_23]
    [82] Gao L, Zheng F, Wei R, et al. DPF-ECC: A framework for efficient ECC with double precision floating-point computing power. IEEE Trans. on Information Forensics and Security, 2021, 16: 3988–4002.
    [83] Fogel LA. Cryptanalysis of the mceliece cryptosystem on GPGPUs. 2015. https://www.semanticscholar.org/paper/Cryptanalysis-of-the-McEliece-Cryptosystem-on-Major/186cb696d28eaefe2b506063e4d4188adf74dc44?p2df
    [84] Elsobky AM, Farag AK, Keshk A. Efficient implementation of McEliece cryptosystem on graphic processing unit. In: Proc. of the 10th Int’l Conf. on Informatics and Systems. Giza Egypt: ACM, 2016. 247–253. [doi: 10.1145/2908446.2908491]
    [85] Bos JW, Friedberger SJ. Faster modular arithmetic for isogeny-based crypto on embedded devices. Journal of Cryptographic Engineering, 2020, 10(2): 97–109.
    [86] Duong-Ngoc P, Tan TN, Lee H. Efficient NewHope cryptography based facial security system on a GPU. IEEE Access, 2020, 8: 108158–108168.
    [87] Seo SC. SIKE on GPU: Accelerating supersingular isogeny-based key encapsulation mechanism on graphic processing units. IEEE Access, 2021, 9: 116731–116744.
    [88] Wright J, Gowanlock M, Philabaum C, Cambou B. A CRYSTALS-Dilithium response-based cryptography engine using GPGPU. In: Proc. of the 2021 Future Technologies Conf. (FTC). Vancouver: Springer, 2021. 32–45. [doi: 10.1007/978-3-030-89912-7_3]
    [89] Lee K, Gowanlock M, Cambou B. SABER-GPU: A response-based cryptography algorithm for SABER on the GPU. In: Proc. of the 26th IEEE Pacific Rim Int’l Symp. on Dependable Computing (PRDC). Perth: IEEE, 2021. 123–132.
    [90] Gao YW, Xu J, Wang HB. cuNH: Efficient GPU implementations of post-quantum KEM NewHope. IEEE Trans. on Parallel and Distributed Systems, 2022, 33(3): 551–568.
    [91] Nejatollahi H, Shahhosseini S, Cammarota R, Dutt N. Exploring energy efficient architectures for RLWE lattice-based cryptography. Journal of Signal Processing Systems, 2021, 93(10): 1139–1148.
    [92] Gupta N, Jati A, Chauhan AK, Chattopadhyay A. PQC acceleration using GPUs: FrodoKEM, NewHope, and Kyber. IEEE Trans. on Parallel and Distributed Systems, 2021, 32(3): 575–586.
    [93] Wan LP, Zheng FY, Fan G, Wei R, Gao LL, Wang YW, Lin JQ, Dong JK. A novel high-performance implementation of CRYSTALS-Kyber with AI accelerator. In: Proc. of the 27th European Symp. on Research in Computer Security. Copenhagen: Springer, 2022. 514–534. [doi: 10.1007/978-3-031-17143-7_25]
    [94] Lee WK, Seo H, Zhang ZF, Hwang SO. TensorCrypto: High throughput acceleration of lattice-based cryptography using tensor core on GPU. IEEE Access, 2022, 10: 20616–20632.
    [95] Lee WK, Seo H, Hwang SO, Achar R, Karmakar A, Mera JMB. DPCrypto: Acceleration of post-quantum cryptography using dot-product instructions on GPUs. IEEE Trans. on Circuits and Systems I: Regular Papers, 2022, 69(9): 3591–3604.
    [96] Lee WK, Hwang SO. High throughput implementation of post-quantum key encapsulation and decapsulation on GPU for Internet of Things applications. IEEE Trans. on Services Computing, 2022, 15(6): 3275–3288.
    [97] Wang ZH, Dong XS, Chen H, Kang Y. Efficient GPU implementations of post-quantum signature XMSS. IEEE Trans. on Parallel and Distributed Systems, 2023, 34(3): 938–954.
    [98] Wang W, Hu Y, Chen LM, Huang XM, Sunar B. Accelerating fully homomorphic encryption using GPU. In: Proc. of the 2012 IEEE Conf. on High Performance Extreme Computing. Waltham: IEEE, 2012. 1–5. [doi: 10.1109/HPEC.2012.6408660]
    [99] Dai W, Doröz Y, Sunar B. Accelerating NTRU based homomorphic encryption using GPUs. In: Proc. of the 2014 IEEE High Performance Extreme Computing Conf. (HPEC). Waltham: IEEE, 2014. 1–6. [doi: 10.1109/HPEC.2014.7041001]
    [100] Dai W, Doröz Y, Sunar B. Accelerating SWHE based pirs using GPUs. In: Proc. of the 2015 Int’l Conf. on Financial Cryptography and Data Security. San Juan: Springer, 2015. 160–171. [doi: 10.1007/978-3-662-48051-9_12]
    [101] Dai W, Sunar B. cuHE: A homomorphic encryption accelerator library. In: Proc. of the 2nd Int’l Conf. on Cryptography and Information Security in the Balkans. Koper: Springer, 2016. 169–186. [doi: 10.1007/978-3-319-29172-7_11]
    [102] Al Badawi A, Veeravalli B, Mun CF, Aung KMM. High-performance FV somewhat homomorphic encryption on GPUs: An implementation using CUDA. IACR Trans. on Cryptographic Hardware and Embedded Systems, 2018, 2018(2): 70–95.
    [103] Kim S, Jung W, Park J, Ahn JH. Accelerating number theoretic transformations for bootstrappable homomorphic encryption on GPUs. In: Proc. of the 2020 IEEE Int’l Symp. on Workload Characterization (IISWC). Beijing: IEEE, 2020. 264–275.
    [104] Al Badawi A, Veeravalli B, Lin J, Xiao N, Kazuaki M, Mi AKM. Multi-GPU design and performance evaluation of homomorphic encryption on GPU clusters. IEEE Trans. on Parallel and Distributed Systems, 2021, 32(2): 379–391.
    [105] Goey JZ, Lee WK, Goi BM, Yap WS. Accelerating number theoretic transform in GPU platform for fully homomorphic encryption. The Journal of Supercomputing, 2021, 77(2): 1455–1474.
    [106] Jung W, Kim S, Ahn JH, Cheon JH, Lee Y. Over 100×faster bootstrapping in fully homomorphic encryption through memory-centric optimization with GPUs. IACR Trans. on Cryptographic Hardware and Embedded Systems, 2021. 114–148.
    [107] Shen SY, Yang H, Liu Y, Liu Z, Zhao YL. CARM: CUDA-accelerated RNS multiplication in word-wise homomorphic encryption schemes for internet of things. IEEE Trans. on Computers, 2023, 72(7): 1999–2010.
    [108] Zhai YJ, Ibrahim M, Qiu YQ, Boemer F, Chen ZZ, Titov A, Lyashevsky A. Accelerating encrypted computing on Intel GPUs. In: Proc. of the 2022 IEEE Int’l Parallel and Distributed Processing Symp. (IPDPS). Lyon: IEEE, 2022. 705–716.
    [109] Fan SY, Wang ZW, Xu WZ, Hou R, Meng D, Zhang MZ. TensorFHE: Achieving practical computation on encrypted data using GPGPU. In: Proc. of the 2023 IEEE Int’l Symp. on High-performance Computer Architecture (HPCA). Montreal: IEEE, 2023. 922–934. [doi: 10.1109/HPCA56546.2023.10071017]
    [110] Wang ZW, Li PN, Hou R, Li ZH, Cao JF, Wang XF, Meng D. HE-booster: An efficient polynomial arithmetic acceleration on GPUs for fully homomorphic encryption. IEEE Trans. on Parallel and Distributed Systems, 2023, 34(4): 1067–1081.
    [111] Jung W, Lee E, Kim S, Kim J, Kim N, Lee K, Min C, Cheon JH, Ahn JH. Accelerating fully homomorphic encryption through architecture-centric analysis and optimization. IEEE Access, 2021, 9: 98772–98789.
    [112] Yang H, Shen SY, Dai WC, Zhou L, Liu Z, Zhao YL. Implementing and benchmarking word-wise homomorphic encryption schemes on GPU. 2023. https://eprint.iacr.org/2023/049.pdf
    [113] Türkoglu ER, Özcan AŞ, Ayduman C, Mert AC, Öztürk E, Savaş E. An accelerated gpu library for homomorphic encryption operations of BFV scheme. In: Proc. of the 2022 IEEE Int’l Symp. on Circuits and Systems (ISCAS). Austin: IEEE, 2022. 1155–1159. [doi: 10.1109/ISCAS48785.2022.9937503]
    [114] 国家密码管理局. 中华人民共和国密码法. 2023. https://www.oscca.gov.cn/sca/xxgk/2023-06/04/content_1057225.shtml
    National Cryptography Administration. Cryptography Law of the People’s Republic of China. 2023 (in Chinese). https://www.oscca.gov.cn/sca/xxgk/2023-06/04/content_1057225.shtml
    [115] 国家密码管理局. 国家密码管理局关于发布《SM3密码杂凑算法》公告. 2010. https://www.oscca.gov.cn/sca/xxgk/2010-12/17/content_1002389.shtml
    National Cryptography Administration. Announcement of the National Cryptography Administration on the release of SM3 password hash algorithm. 2010 (in Chinese). https://www.oscca.gov.cn/sca/xxgk/2010-12/17/content_1002389.shtml
    [116] 国家密码管理局. 国家密码管理局关于发布《SM9标识密码算法》等2项密码行业标准公告. 2016. https://www.oscca.gov.cn/sca/xxgk/2016-03/28/content_1002407.shtml,
    National Cryptography Administration. Announcement of the National Cryptography Administration on the release of two cryptographic industry standards, including SM9 identification password algorithm. 2016 (in Chinese). https://www.oscca.gov.cn/sca/xxgk/2016-03/28/content_1002407.shtml
    [117] Shamir A. Identity-based cryptosystems and signature schemes. In: Proc. of the 1985 Workshop on the Theory and Application of Cryptographic Techniques. Santa Barbara: Springer, 1985. 47–53. [doi: 10.1007/3-540-39568-7_5]
    [118] 国家密码管理局. 我国SM9密钥交换协议正式成为ISO/IEC国际标准. 2021. https://www.sca.gov.cn/sca/xwdt/2021-11/05/content_1060901.shtml
    National Cryptography Administration. Our SM9 key exchange protocol is officially ISO/IEC international standard. 2021 (in Chinese). https://www.sca.gov.cn/sca/xwdt/2021-11/05/content_1060901.shtml
    [119] Wang JP, Zhang T, Zhang B, Jeremy-Gillbanks, Xin Z. An innovative FPGA implementations of the secure frequency hopping communication system based on the improved ZUC algorithm. IEEE Access, 2022, 10: 54634–54648.
    [120] AspenCore分析师团队. 35家国产处理器芯片(CPU/GPU/FPGA)厂商调研报告. 2022. http://www.infosecworld.cn/index.php?m=content&c=index&a=show&catid=40&id=1149
    AspenCore analyst team. Survey report of 35 domestic processor chip (CPU/GPU/FPGA) manufacturers. 2022 (in Chinese). http://www.infosecworld.cn/index.php?m=content&c=index&a=show&catid=40&id=1149
    [121] 邓豹, 孙靖国. 国产嵌入式处理器发展综述. 航空计算技术, 2021, 51(1): 120–124.
    Deng B, Sun JG. Review on development of domestic embedded processor. Aeronautical Computing Technique, 2021, 51(1): 120–124 (in Chinese with English abstract).
    [122] 中国信通院. 隐私保护计算与合规应用研究报告(2021年). 2021. http://www.caict.ac.cn/kxyj/qwfb/ztbg/202104/t20210401_372713.htm
    CAICT. Privacy protection computing and compliance applications research report (2021). 2021 (in Chinese). http://www.caict.ac.cn/kxyj/qwfb/ztbg/202104/t20210401_372713.htm
    [123] Yao AC. Protocols for secure computations. In: Proc. of the 23rd Annual Symp. on Foundations of Computer Science (SFCS 1982). Chicago: IEEE, 1982. 160–164. [doi: 10.1109/SFCS.1982.38]
    [124] 谭作文, 张连福. 机器学习隐私保护研究综述. 软件学报, 2020, 31(7): 2127–2156. http://www.jos.org.cn/1000-9825/6052.htm
    Tan ZW, Zhang LF. Survey on privacy preserving techniques for machine learning. Ruan Jian Xue Bao/Journal of Software, 2020, 31(7): 2127–2156 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6052.htm
    [125] 郭娟娟, 王琼霄, 许新, 王天雨, 林璟锵. 安全多方计算及其在机器学习中的应用. 计算机研究与发展, 2021, 58(10): 2163–2186.
    Guo JJ, Wang QX, Xu X, Wang TY, Lin JQ. Secure multiparty computation and application in machine learning. Journal of Computer Research and Development, 2021, 58(10): 2163–2186 (in Chinese with English abstract).
    [126] Yao ACC. How to generate and exchange secrets. In: Proc. of the 27th Annual Symp. on Foundations of Computer Science (SFCS 1986). Toronto: IEEE, 1986. 162–167. [doi: 10.1109/SFCS.1986.25]
    [127] Wang X, Malozemoff AJ, Katz J. EMP-toolkit: Efficient multiparty computation toolkit. 2016. https://github.com/emp-toolkit/emp-readmem
    [128] Goldreich O, Micali S, Wigderson A. How to play any mental game, or a completeness theorem for protocols with honest majority. In: Goldreich O, ed., Providing Sound Foundations for Cryptography: On the Work of Shafi Goldwasser and Silvio Micali. New York: ACM, 2019. 307–328. [doi: 10.1145/3335741.3335755]
    [129] Ben-Or M, Goldwasser S, Wigderson A. Completeness theorems for non-cryptographic fault-tolerant distributed computation. In: Goldreich O, ed., Providing Sound Foundations for Cryptography: On the Work of Shafi Goldwasser and Silvio Micali. New York: ACM, 2019. 351–371. [doi: 10.1145/3335741.3335756]
    [130] Demmler D, Schneider T, Zohner M. ABY—A framework for efficient mixed-protocol secure two-party computation. In: Proc. of the 22nd Annual Network and Distributed System Security Symp. San Diego: NDSS, 2015.
    [131] Mohassel P, Rindal P. ABY3: A mixed protocol framework for machine learning. In: Proc. of the 2018 ACM SIGSAC Conf. on Computer and Communications Security. Toronto: ACM, 2018. 35–52. [doi: 10.1145/3243734.3243760]
    [132] Keller M. MP-SPDZ: A versatile framework for multi-party computation. In: Proc. of the 2020 ACM SIGSAC Conf. on Computer and Communications Security. ACM, 2020. 1575–1590.
    [133] Malkhi D, Nisan N, Pinkas B, Sella Y. FairPlay—A secure two-party computation system. In: Proc. of the 13th Conf. on USENIX Security Symp. San Diego: USENIX Association, 2004. 20.
    [134] Pu S, Duan P, Liu JC. FastPlay—A parallelization model and implementation of SMC on CUDA based GPU cluster architecture. 2011. https://eprint.iacr.org/2011/097.pdf
    [135] Watson JL, Wagh S, Popa RA. Piranha: A GPU platform for secure computation. In: Proc. of the 31st USENIX Security Symp. (USENIX Security 22). 2022. 827–844.
    [136] Frederiksen TK, Jakobsen TP, Nielsen JB. Faster maliciously secure two-party computation using the GPU. In: Proc. of the 9th Int’l Conf. on Security and Cryptography for Networks. Amalfi: Springer, 2014. 358–379. [doi: 10.1007/978-3-319-10879-7_21]
    [137] Lindell Y. Fast cut-and-choose-based protocols for malicious and covert adversaries. Journal of Cryptology, 2016, 29(2): 456–490.
    [138] Frederiksen TK, Nielsen JB. Fast and maliciously secure two-party computation using the GPU. In: Proc. of the 11th Int’l Conf. on Applied Cryptography and Network Security. Banff: Springer, 2013. 339–356. [doi: 10.1007/978-3-642-38980-1_21]
    [139] Shelat A, Shen CH. Fast two-party secure computation with minimal assumptions. In: Proc. of the 2013 ACM SIGSAC Conf. on Computer & Communications Security. Berlin: ACM, 2013. 523–534. [doi: 10.1145/2508859.2516698]
    [140] Zhang F, Chen Z, Zhang CY, Zhou AC, Zhai JD, Du XY. An efficient parallel secure machine learning framework on GPUs. IEEE Trans. on Parallel and Distributed Systems, 2021, 32(9): 2262–2276.
    [141] Tan SJ, Knott B, Tian Y, Wu DJ. CryptGPU: Fast privacy-preserving machine learning on the GPU. In: Proc. of the 2021 IEEE Symp. on Security and Privacy (SP). San Francisco: IEEE, 2021. 1021–1038. [doi: 10.1109/SP40001.2021.00098]
    [142] NVIDIA. CUDA C++ programming guide. 2015. http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html
    [143] Wikipedia. Wikipedia: List of NVIDIA graphics processing units. 2014. http://en.wikipedia.org/wiki/Comparison_of_NVIDIA_Graphics_Processing_Units
    [144] NVIDIA. The tensor core GPU architecture designed to bring AI to every industry. 2017. https://www.nvidia.com/en-in/data-center/volta-gpu-architecture/
    [145] Vasiliadis G, Athanasopoulos E, Polychronakis M, Ioannidis S. PixelVault: Using GPUs for securing cryptographic operations. In: Proc. of the 2014 ACM SIGSAC Conf. on Computer and Communications Security. Scottsdale: ACM, 2014. 1131–1142.
    [146] Halderman JA, Schoen SD, Heninger N, Clarkson W, Paul W, Calandrino JA, Feldman AJ, Appelbaum J, Felten EW. Lest we remember: Cold-boot attacks on encryption keys. Communications of the ACM, 2009, 52(5): 91–98.
    [147] Durumeric Z, Li F, Kasten J, Amann J, Beekman J, Payer M, Weaver N, Adrian D, Paxson V, Bailey M, Halderman JA. The matter of heartbleed. In: Proc. of the 2014 Conf. on Internet Measurement Conf. Vancouver: ACM, 2014. 475–488.
    [148] Müller T, Freiling FC, Dewald A. TRESOR runs encryption securely outside RAM. In: Proc. of the 20th USENIX Conf. on Security. San Francisco: USENIX Association, 2011. 17.
    [149] Simmons P. Security through amnesia: A software-based solution to the cold boot attack on disk encryption. In: Proc. of the 27th Annual Computer Security Applications Conf. Orlando: ACM, 2011. 73–82. [doi: 10.1145/2076732.2076743]
    [150] Guan L, Lin JQ, Luo B, Jing JW. Copker: Computing with private keys without RAM. In: Proc. of the 21st Annual Network and Distributed System Security Symp. San Diego: NDSS. 2014. 23–26.
    [151] Guan L, Lin JQ, Luo B, Jing JW, Wang J. Protecting private keys against memory disclosure attacks using hardware transactional memory. In: Proc. of the 2015 IEEE Symp. on Security and Privacy. San Jose: IEEE, 2015. 3–19. [doi: 10.1109/SP.2015.8]
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

董建阔,黄跃花,付宇笙,肖甫,郑昉昱,林璟锵,董振江.基于异构多核心GPU的高性能密码计算技术研究进展.软件学报,2024,35(12):5582-5608

复制
分享
文章指标
  • 点击次数:778
  • 下载次数: 2932
  • HTML阅读次数: 253
  • 引用次数: 0
历史
  • 收稿日期:2023-05-11
  • 最后修改日期:2023-09-08
  • 在线发布日期: 2024-03-13
  • 出版日期: 2024-12-06
文章二维码
您是第19564681位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号