一种超低损失的深度神经网络量化压缩方法

doi:10.13328/j.cnki.jos.006189

微信服务号

微信订阅号

2025年8月4日 22:07 星期一

首页 > 过刊浏览>2021年第32卷第8期 >2391-2407. DOI:10.13328/j.cnki.jos.006189

PDF HTML阅读 XML下载导出引用引用提醒

一种超低损失的深度神经网络量化压缩方法
DOI:
                        10.13328/j.cnki.jos.006189
                    
CSTR:
                        
                    
作者:
                        龚成龚成
南开大学 计算机学院, 天津 300350;天津市网络和数据安全技术重点实验室(南开大学), 天津 300350
在期刊界中查找
在百度中查找
在本站中查找
卢冶卢冶
南开大学 计算机学院, 天津 300350;天津市网络和数据安全技术重点实验室(南开大学), 天津 300350
在期刊界中查找
在百度中查找
在本站中查找
代素蓉代素蓉
南开大学 计算机学院, 天津 300350;天津市网络和数据安全技术重点实验室(南开大学), 天津 300350
在期刊界中查找
在百度中查找
在本站中查找
刘方鑫刘方鑫
南开大学 计算机学院, 天津 300350;天津市网络和数据安全技术重点实验室(南开大学), 天津 300350
在期刊界中查找
在百度中查找
在本站中查找
陈新伟陈新伟
工业机器人应用福建省高校工程研究中心(闽江学院), 福建 福州 350121
在期刊界中查找
在百度中查找
在本站中查找
李涛李涛
南开大学 计算机学院, 天津 300350;天津市网络和数据安全技术重点实验室(南开大学), 天津 300350;计算机体系结构国家重点实验室(中国科学院 计算技术研究所), 北京 100190
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:龚成(1993-),男,博士生,CCF学生会员,主要研究领域为神经网络压缩,高性能嵌入式系统,异构计算,人工智能.
刘方鑫(1996-),男,硕士,主要研究领域为神经网络压缩,异构计算,人工智能.
卢冶(1986-),男,博士,副教授,CCF专业会员,主要研究领域为神经网络压缩,高性能嵌入式系统,异构计算,人工智能.
陈新伟(1984-),男,博士,副教授,主要研究领域为机器人控制技术,工业视觉系统,移动机器人系统.
代素蓉(1997-),女,硕士生,CCF学生会员,主要研究领域为神经网络压缩,机器学习,异构计算.
李涛(1977-),男,博士,教授,博士生导师,CCF杰出会员,主要研究领域为异构计算,机器学习,物联网.
通讯作者:卢冶,E-mail:luye@nankai.edu.cn
中图分类号:TP181
基金项目:国家重点研发计划（2018YFB2100300）；国家自然科学基金（62002175，61872200）；天津自然科学基金（19JCZDJC31600，19JCQNJC00600）；计算机体系结构国家重点实验室（中国科学院计算技术研究所）开放课题（CARCHB202016，CARCH201905）；中国高校产学研创新基金（2020HYA01003）；工业机器人应用福建省高校工程研究中心（闽江学院）开放基金（MJUKF-IRA1902）

Ultra-low Loss Quantization Method for Deep Neural Network Compression

Author:

GONG Cheng
GONG Cheng
College of Computer Science, Nankai University, Tianjin 300350, China;Tianjin Key Laboratory of Network and Data Security Technology(Nankai University), Tianjin 300350, China
在期刊界中查找
在百度中查找
在本站中查找
LU Ye
LU Ye
College of Computer Science, Nankai University, Tianjin 300350, China;Tianjin Key Laboratory of Network and Data Security Technology(Nankai University), Tianjin 300350, China
在期刊界中查找
在百度中查找
在本站中查找
DAI Su-Rong
DAI Su-Rong
College of Computer Science, Nankai University, Tianjin 300350, China;Tianjin Key Laboratory of Network and Data Security Technology(Nankai University), Tianjin 300350, China
在期刊界中查找
在百度中查找
在本站中查找
LIU Fang-Xin
LIU Fang-Xin
College of Computer Science, Nankai University, Tianjin 300350, China;Tianjin Key Laboratory of Network and Data Security Technology(Nankai University), Tianjin 300350, China
在期刊界中查找
在百度中查找
在本站中查找
CHEN Xin-Wei
CHEN Xin-Wei
Industrial Robot Application of Fujian University Engineering Research Center(Minjiang University), Fujian 350121, China
在期刊界中查找
在百度中查找
在本站中查找
LI Tao
LI Tao
College of Computer Science, Nankai University, Tianjin 300350, China;Tianjin Key Laboratory of Network and Data Security Technology(Nankai University), Tianjin 300350, China;State Key Laboratory of Computer Architecture(Institute of Computing Technology, Chinese Academy of Sciences), Beijing 100190, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

National Key Research and Development Program of China (2018YFB2100300); National Natural Science Foundation of China (62002175, 61872200); Natural Science Foundation of Tianjin Municipality (19JCZDJC31600, 19JCQNJC00600); Open Fund of State Key Laboratory of Computer Architecture (Institute of Computing Technology, Chinese Academy of Sciences) (CARCHB202016, CARCH201905); Innovation Fund of Chinese Universities Industry-University-Research (2020HYA01003); Open Fund of Industrial Robot Application of Fujian University Engineering Research Center (Minjiang University) (MJUKF-IRA1902)

摘要

图/表

访问统计

参考文献 [61]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

深度神经网络（deep neural network，简称DNN）量化是一种高效的模型压缩方法，使用少量位宽表示模型计算过程中的参数和中间结果数据.数据位宽会直接影响内存占用、计算效率和能耗.以往的模型量化研究缺乏有效的定量分析，这导致量化损失难以预测.提出了一种超低损失的DNN量化方法（ultra-low loss quantization，简称μL2Q），以揭示量化位宽与量化损失之间的内在联系，指导量化位宽选择并降低量化损失.首先，将原始数据映射为标准正态分布的数据；然后，在等宽的量化区间中搜索最优量化参数；最后，将μL2Q方法融合进DNN的训练过程，并嵌入到主流的机器学习框架Caffe及Keras中，以支撑端到端模型压缩的设计和训练.实验结果表明，与最新的研究方法相比，在相同的位宽条件下，mL2Q方法能够保证更高的模型精度，在典型的神经网络模型上精度分别提高了1.94%，3.73%和8.24%.显著性物体检测实验结果表明，μL2Q方法能够胜任复杂的计算机视觉任务.

关键词:神经网络压缩;神经网络量化;权值分布;均匀量化;量化损失最优解

Abstract:

Deep neural network (DNN) quantization is an efficient model compression method, in which parameters and intermediate results are expressed by low bit width. The bit width of data will directly affect the memory footprint, computing power and energy consumption. Previous researches on model quantization lack effective quantitative analysis, which leads to unpredictable quantization loss of these methods. This study proposes an ultra-low loss quantization (μL2Q) method for DNN compression, which reveals the internal relationship between quantization bit width and quantization loss, effectively guiding the selection of quantization bit width and reducing quantization loss. First, the original data is mapped to the data with standard normal distribution and then the optimal parameter configuration is sought to reduce the quantization loss under the target bit width. Finally, μL2Q has been encapsulated and integrated into two popular deep learning training frameworks, including Caffe and Keras, to support the design and training of end-to-end model compression. The experimental results show that compared with the state-of-the-art three clusters of quantization solutions, μL2Q can still guarantee the accuracy and deliver 1.94%, 3.73%, and 8.24% of accuracy improvements under the typical neural networks with the same quantization bit width, respectively. In addition, it is also verified that μL2Q can be competent for more complex computer vision tasks through salient object detection experiments.

Key words:neural network compression;neural network quantization;weight distribution;uniform quantization;extremum of quantizationloss

参考文献

[1] Peng YL, Zhang L, Zhang Y, Liu SG, Guo M. Deep deconvolution neural network for image super-resolution. Ruan Jian Xue Bao/Journal of Software, 2018,29(4):926-934(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5407.htm[doi:10. 13328/j.cnki.jos.005407]

[2] Ge DH, Li HS, Zhang L, Liu RY, Shen PY, Miao QG. Survey of lightweight neural network. Ruan Jian Xue Bao/Journal of Software, 2020,31(9):2627-2653(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5942.htm[doi:10.13328/j. cnki.jos.005942]

[3] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In:Bengio Y, Le Cun Y, eds. Proc. of the ICLR. San Diego, 2015.[doi:10.13328/j.cnki.jos.005428]

[4] He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In:Proc. of the CVPR. Las Vegas:IEEE Computer Society, 2016. 770-778.[doi:10.1109/CVPR.2016.90]

[5] Fan DP, Wang W, Cheng MM, Shen J. Shifting more attention to video salient object detection. In:Proc. of the CVPR. Long Beach:Computer Vision Foundation IEEE, 2019. 8554-8564.[doi:10.1109/CVPR.2019.00875]

[6] Girshick RB. Fast R-CNN. In:Proc. of the ICCV. Santiago:IEEE Computer Society, 2015. 1440-1448.[doi:10.1109/ICCV.2015. 169]

[7] Liu W, Anguelov D, Erhan D, Szegedy C, Reed SE, Fu CY, Berg AC. SSD:Single shot MultiBox detector. In:Proc. of the ECCV. Cham:Springer-Verlag, 2016. 21-37.[doi:10.1007/978-3-319-46448-0_2]

[8] Szegedy C, Ioffe S, Vanhoucke V, Alemi AA. Inception-v4, inception-ResNet and the impact of residual connections on learning. In:Proc. of the AAAI. San Francisco:AAAI Press, 2017. 4278-4284.

[9] Fan D, Cheng M, Liu J, Gao S, Hou Q, Borji A. Salient objects in clutter:Bringing salient object detection to the foreground. In:Proc. of the ECCV. Munich:Springer-Verlag, 2018. 196-212.[doi:10.1007/978-3-030-01267-0_12]

[10] Noh H, Hong S, Han B. Learning deconvolution network for semantic segmentation. In:Proc. of the ICCV. Santiago:IEEE Computer Society, 2015. 1520-1528.[doi:10.1109/ICCV.2015.178]

[11] Pohlen T, Alex, Hermans E, Mathias M, Leibe B. Full-Resolution residual networks for semantic segmentation in street scenes. In:Proc. of the CVPR. Honolulu:IEEE Computer Society, 2017. 3309-3318.[doi:10.1109/CVPR.2017.353]

[12] Girshick RB, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. In:Proc. of the CVPR. Columbus:IEEE Computer Society, 2014. 580-587.[doi:10.1109/CVPR.2014.81]

[13] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In:Proc. of the CVPR. Boston:IEEE Computer Society, 2015. 3431-3440.[doi:10.1109/CVPR.2015.7298965]

[14] Lei J, Gao X, Song J, Wang XL, Song ML. Survey of deep neural network model compression. Ruan Jian Xue Bao/Journal of Software, 2018,29(2):251-266(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5428.htm[doi:10.13328/j. cnki.jos.005428]

[15] Hubara I, Courbariaux M, Soudry D, El-Yaniv R, Bengio Y. Binarized neural networks. In:Lee DD, ed. Proc. of the NIPS. 2016. 4107-4115.

[16] Li F, Zhang B, Liu B. Ternary weight networks. arXiv preprint arXiv:1605.04711, 2016.

[17] Zhou S, Ni Z, Zhou X, Wen H, Wu Y, Zou Y. DoReFa-Net:Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160, 2016.

[18] Rastegari M, Ordonez V, Redmon J, Farhadi A. XNOR-Net:ImageNet classification using binary convolutional neural networks. In:Leibe B, ed. Proc. of the ECCV. Springer-Verlag, 2016. 525-542.[doi:10.1007/978-3-319-46493-0_32]

[19] Gysel P, Motamedi M, Ghiasi S. Hardware-Oriented approximation of convolutional neural networks. arXiv preprint arXiv:1604.03168, 2016.

[20] Kim M, Smaragdis P. Bitwise neural networks. arXiv preprint arXiv:1601.06071, 2016.

[21] Han S, Mao H, Dally WJ. Deep compression:Compressing deep neural network with pruning, trained quantization and huffman coding. In:Proc. of the ICLR. Puerto Rico, 2015.

[22] Jacob B, Kligys S, Chen B, Zhu M, Tang M, Howard AG, Adam H, Kalenichenko D. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In:Proc. of the CVPR. Salt Lake City:IEEE Computer Society, 2018. 2704-2713.

[23] Jain SR, Gural A, Wu M, Dick C. Trained uniform quantization for accurate and efficient neural network inference on fixed-point hardware. arXiv preprint arXiv:1903.08066, 2016.

[24] Bishop CM. Pattern Recognition and Machine Learning. Springer-Verlag, 2006.

[25] Murphy KP. Machine Learning:A Probabilistic Perspective. MIT Press, 2012.

[26] Zhu C, Han S, Mao H, Dally WJ. Trained ternary quantization. In:Proc. of the ICLR. 2017. https://openreview.net/pdf?id=S1_pA u9xl

[27] Jin C, Sun H, Kimura S. Sparse ternary connect:Convolutional neural networks using ternarized weights with enhanced sparsity. In:Shin Y, ed. Proc. of the ASP-DAC. IEEE, 2018. 190-195.[doi:10.1109/ASPDAC.2018.8297304]

[28] Lin DD, Talathi SS, Annapureddy VS. Fixed point quantization of deep convolutional networks. In:Balcan M, Weinberger KQ, eds. Proc. of the ICML. New York, 2016. 2849-2858.

[29] Polino A, Pascanu R, Alistarh D. Model compression via distillation and quantization. In:Proc. of the ICLR. 2018. https://openreview.net/pdf?id=S1XolQbRW

[30] Wang P, Hu Q, Zhang Y, Zhang C, Liu Y, Cheng J. Two-Step quantization for low-bit neural networks. In:Proc. of the CVPR. IEEE Computer Society, 2018. 4376-4384.[doi:10.1109/CVPR.2018.00460]

[31] Gong C, Li T, Lu Y, Hao C, Zhang X, Chen D, Chen Y. μL2Q:An ultra-low loss quantization method for DNN compression. In:Proc. of the IJCNN. IEEE, 2019. 1-8.[doi:10.1109/IJCNN.2019.8851699]

[32] Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick RB, Guadarrama S, Darrell T. Caffe:Convolutional architecture for fast feature embedding. In:Hua KA, ed. Proc. of the 22nd ACM Int'l Conf. on Multimedia. ACM, 2014. 675-678.[doi:10.1145/2647868.2654889]

[33] Chollet F, et al. In GitHub repository. 2015. https://github.com/keras-team/keras

[34] Le Cun Y, Bottou L, Bengio Y, Haffner P. Gradient-Based learning applied to document recognition. Proc. of the IEEE, 1998, 86(11):2278-2324.[doi:10.1109/5.726791]

[35] Krizhevsky A, Hinton G. Learning multiple layers of features from tiny images. 2009. http://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf

[36] Deng J, Dong W, Socher R, Li L, Li K, Li F. Imagenet:A large-scale hierarchical image database. In:Proc. of the CVPR. IEEE Computer Society, 2009. 248-255.[doi:10.1109/CVPR.2009.5206848]

[37] Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. Communications of the ACM, 2017,60(6):84-90.[doi:10.1145/3065386]

[38] Sandler M, Howard A, Zhu ML, Zhmoginov A, Chen LC. Mobilenetv2:Inverted residuals and linear bottlenecks. In:Proc. of the CVPR. IEEE Computer Society, 2018. 4510-4520.[doi:10.1109/CVPR.2018.00474]

[39] Ghasemzadeh M, Samragh M, Koushanfar F. ReBNet:Residual binarized neural network. In:Proc. of the FCCM. IEEE Computer Society, 2018. 57-64.[doi:10.1109/FCCM.2018.00018]

[40] Courbariaux M, Bengio Y, David JP. Binaryconnect:Training deep neural networks with binary weights during propagations. In:Proc. of the NIPS 2015. 2015. 3123-3131.

[41] Alemdar H, Leroy V, Prost-Boucle A, Petro F. Ternary neural networks for resource-efficient AI applications. In:Proc. of the IJCNN. IEEE, 2017. 2547-2554.[doi:10.1109/IJCNN.2017.7966166]

[42] Esser SK, Appuswamy R, Merolla P, Arthur JV, Modha DS. Backpropagation for energy-efficient neuromorphic computing. In:Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R, eds. Proc. of the NIPS. 2015. 1117-1125.

[43] Leng C, Dou Z, Li H, Zhu S, Jin R. Extremely low bit neural network:squeeze the last bit out with ADMM. In:McIlraith SA, Weinberger KQ, eds. Proc. of the AAAI. AAAI Press, 2018. 3466-3473.

[44] Lin ZH, Courbariaux M, Memisevic R, Bengio Y. Neural networks with few multiplications. arXiv preprint arXiv:1510.03009, 2015.

[45] Wang W, Lai Q, Fu H, Shen J, Ling H. Salient object detection in the deep learning era:An in-depth survey. arXiv preprint arXiv:1904.09146, 2016.

[46] Cheng M, Mitra NJ, Huang X, Torr PHS, Hu S. Global contrast based salient region detection. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2015,37(3):569-582.[doi:10.1109/TPAMI.2014.2345401]

[47] Yan Q, Xu L, Shi J, Jia J. Hierarchical saliency detection. In:Proc. of the CVPR. IEEE Computer Society, 2013. 1155-1162.

[48] Wang L, Lu H, Wang Y, Feng M, Wang D, Yin B, Ruan X. Learning to detect salient objects with image-level supervision. In:Proc. of the CVPR. IEEE Computer Society, 2017. 3796-3805.[doi:10.1109/CVPR.2017.404]

[49] Movahedi V, Elder JH. Design and perceptual validation of performance measures for salient object segmentation. In:Proc. of the CVPR. IEEE Computer Society, 2010. 49-56.[doi:10.1109/CVPRW.2010.5543739]

[50] Cheng M, Mitra NJ, Huang X, Hu S. Salientshape:Group saliency in image collections. The Visual Computer, 2014,30(4):443-453.

[51] Ronneberger O, Fischer P, Brox T. U-Net:Convolutional networks for biomedical image segmentation. In:Navab N, et al., eds. Proc. of the MICCAI. Cham:Springer-Verlag, 2015. 234-241.[doi:10.1007/978-3-319-24574-4_28]

[52] Chaurasia A, Culurciello E. Linknet:Exploiting encoder representations for efficient semantic segmentation. In:Proc. of the VCIP. IEEE, 2017. 1-4.[doi:10.1109/VCIP.2017.8305148]

[53] Zhou Z, Siddiquee MMR, Tajbakhsh N, Liang J. Unet++:A nested u-net architecture for medical image segmentation. In:Stoyanov D, Taylor Z, eds. Proc. of the MICCAI. Cham:Springer-Verlag, 2018. 3-11.[doi:10.1007/978-3-030-00889-5_1]

[54] Perazzi F, Krähenbühl P, Pritch Y, Alex, Hornung E. Saliency filters:Contrast based filtering for salient region detection. In:Proc. of the ICCV. IEEE, 2012. 733-740.[doi:10.1109/CVPR.2012.6247743]

[55] Achanta R, Hemami S, Estrada F, Susstrunk S. Frequency-Tuned salient region detection. In:Proc. of the CVPR. IEEE Computer Society, 2009. 1597-1604.[doi:10.1109/CVPR.2009.5206596]

[56] Fan D, Cheng M, Liu Y, Li T, Borji A. Structure-Measure:A new way to evaluate foreground maps. In:Proc. of the ICCV. IEEE Computer Society, 2017. 4558-4567.[doi:10.1109/ICCV.2017.487]

[57] Fan D, Gong C, Cao Y, Ren B, Cheng M, Borji A. Enhanced-Alignment measure for binary foreground map evaluation. In:Lang J, ed. Proc. of the PIJCAI. Stockholm, 2018. 698-704.[doi:10.24963/ijcai.2018/97]

附中文参考文献:

[1] 彭亚丽,张鲁,张钰,刘侍刚,郭敏.基于深度反卷积神经网络的图像超分辨率算法.软件学报,2018,29(4):926-934. http://www.jos.org.cn/1000-9825/5407.htm[doi:10.13328/j.cnki.jos.005407]

[2] 葛道辉,李洪升,张亮,刘如意,沈沛意,苗启广.轻量级神经网络架构综述.软件学报,2020,31(9):2627-2653. http://www.jos.org.cn/1000-9825/5942.htm[doi:10.13328/j.cnki.jos.005942]

[14] 雷杰,高鑫,宋杰,王兴路,宋明黎.深度网络模型压缩综述.软件学报,2018,29(2):251-266. http://www.jos.org.cn/1000-9825/5428.htm[doi:10.13328/j.cnki.jos.005428]

引用本文

龚成,卢冶,代素蓉,刘方鑫,陈新伟,李涛.一种超低损失的深度神经网络量化压缩方法.软件学报,2021,32(8):2391-2407

复制

文章指标

点击次数:3050
下载次数: 7389
HTML阅读次数: 4847
引用次数: 0

历史

收稿日期:2020-07-21
最后修改日期:2020-09-07
录用日期:
在线发布日期: 2021-02-07
出版日期: 2021-08-06

微信服务号

微信订阅号

引用本文

相关视频

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

相关视频

分享

微信扫一扫：分享

文章指标

历史

文章二维码