面向稀疏卷积神经网络的GPU性能优化方法
作者:
作者单位:

作者简介:

董晓(1992-),男,博士,CCF学生会员,主要研究领域为深度学习编译优化技术,稀疏计算,GPU性能优化;李晶(1991-),女,博士,主要研究领域为异构计算,GPU程序优化;刘雷(1980-),男,博士,工程师,CCF专业会员,主要研究领域为自动并行化等编译优化技术,智能计算机编程方法,物联网编程与人工智能编程;冯晓兵(1969-),男,博士,研究员,博士生导师,CCF杰出会员,主要研究领域为编译与编程技术.

通讯作者:

刘雷,E-mail:liulei@ict.ac.cn

中图分类号:

基金项目:

国家自然科学基金(61521092);国家重点研发计划(2017YFB1003103)


Performance Optimizing Method for Sparse Convolutional Neural Networks on GPU
Author:
Affiliation:

Fund Project:

National Natural Science Foundation of China (61521092); National Key Research and Development Program of China (2017YFB 1003103)

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    近些年来,深度卷积神经网络在多项任务中展现了惊人的能力,并已经被用在物体检测、自动驾驶和机器翻译等众多应用中.但这些模型往往参数规模庞大,并带来了沉重的计算负担.神经网络的模型剪枝技术能够识别并删除模型中对精度影响较小的参数,从而降低模型的参数数目和理论计算量,给模型的高效执行提供了机会.然而,剪枝后的稀疏模型却难以在GPU上实现高效执行,其性能甚至差于剪枝前的稠密模型,导致模型剪枝难以带来真正的执行性能收益.提出一种稀疏感知的代码生成方法,能够生成高效的稀疏卷积GPU程序.首先为卷积算子设计了算子模板,并结合GPU的特点对模板代码进行了多种优化.算子模板中的源代码经过编译和分析被转换为算子中间表示模板,设计了一种稀疏代码生成方法,能够结合剪枝后的稀疏参数,基于中间表示模板生成对应的稀疏卷积代码.同时,利用神经网络执行过程中的数据访问特点对数据的访问和放置进行了优化,有效提升了访存吞吐量.最后,稀疏参数的位置信息被隐式编码在生成的代码中,不需要额外的索引结构,降低了访存需求.在实验中证明了:相对于GPU上已有的稀疏神经网络执行方法,提出的稀疏感知的代码生成方法能够有效提升稀疏卷积神经网络的性能.

    Abstract:

    In recent years, with dominating capability shown in plenty of tasks, deep convolutional neural networks have been deployed in applications including object detection, autonomous driving, machine translation, etc. But these models are accompanied by huge amounts of parameters and bring a heavy computational burden. The neural network pruning technique can recognize and remove parameters that contribute little to the accuracy, resulting in reduced amounts of parameters and decreased theoretical computational requirement, thus providing a chance to accelerate neural network models. However, it is hard for the pruned sparse models to achieve efficient execution on GPUs, and the performance of sparse models cannot even match their well-optimized dense counterparts. This study designs a sparsity-aware code generating method, which can generate efficient GPU code for sparse convolutions in pruned neural networks. First, a template is designed for convolution operators with several optimizations targeting GPU architecture. Through compiling and analyzing, the operator template is transformed to the intermediate representation template, which serves as the input to the designed algorithm to generate sparse convolution code according to specific sparse convolution parameters. Moreover, to improve memory throughput, optimizations are performed on data access and data placement based on the characteristics of memory access in neural networks. Finally, as the location information can be encoded into the generated code implicitly, the index structure for the sparse parameters can be eliminated, reducing the memory footprint during the execution. In experiments, it is demonstrated that the proposed sparse code generating method can improve the performance of sparse convolutional neural networks compared with current methods.

    参考文献
    相似文献
    引证文献
引用本文

董晓,刘雷,李晶,冯晓兵.面向稀疏卷积神经网络的GPU性能优化方法.软件学报,2020,31(9):2944-2964

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2019-10-05
  • 最后修改日期:2020-01-13
  • 录用日期:
  • 在线发布日期: 2020-04-21
  • 出版日期: 2020-09-06
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号