Ultra-low Loss Quantization Method for Deep Neural Network Compression

doi:10.13328/j.cnki.jos.006189

微信服务号

微信订阅号

2025-6-3- 22

Home > Archive>Volume 32, Issue 8, 2021 >2391-2407. DOI:10.13328/j.cnki.jos.006189

PDF HTML XML Export Cite reminder

Ultra-low Loss Quantization Method for Deep Neural Network Compression
DOI:
                        10.13328/j.cnki.jos.006189
                    
Author:
                        GONG ChengGONG Cheng
College of Computer Science, Nankai University, Tianjin 300350, China;Tianjin Key Laboratory of Network and Data Security Technology(Nankai University), Tianjin 300350, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
LU YeLU Ye
College of Computer Science, Nankai University, Tianjin 300350, China;Tianjin Key Laboratory of Network and Data Security Technology(Nankai University), Tianjin 300350, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
DAI Su-RongDAI Su-Rong
College of Computer Science, Nankai University, Tianjin 300350, China;Tianjin Key Laboratory of Network and Data Security Technology(Nankai University), Tianjin 300350, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
LIU Fang-XinLIU Fang-Xin
College of Computer Science, Nankai University, Tianjin 300350, China;Tianjin Key Laboratory of Network and Data Security Technology(Nankai University), Tianjin 300350, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
CHEN Xin-WeiCHEN Xin-Wei
Industrial Robot Application of Fujian University Engineering Research Center(Minjiang University), Fujian 350121, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
LI TaoLI Tao
College of Computer Science, Nankai University, Tianjin 300350, China;Tianjin Key Laboratory of Network and Data Security Technology(Nankai University), Tianjin 300350, China;State Key Laboratory of Computer Architecture(Institute of Computing Technology, Chinese Academy of Sciences), Beijing 100190, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:TP181
Fund Project:National Key Research and Development Program of China (2018YFB2100300); National Natural Science Foundation of China (62002175, 61872200); Natural Science Foundation of Tianjin Municipality (19JCZDJC31600, 19JCQNJC00600); Open Fund of State Key Laboratory of Computer Architecture (Institute of Computing Technology, Chinese Academy of Sciences) (CARCHB202016, CARCH201905); Innovation Fund of Chinese Universities Industry-University-Research (2020HYA01003); Open Fund of Industrial Robot Application of Fujian University Engineering Research Center (Minjiang University) (MJUKF-IRA1902)

Article

Figures

Metrics

Reference [61]

Cited by

Materials

Comments

Abstract:

Deep neural network (DNN) quantization is an efficient model compression method, in which parameters and intermediate results are expressed by low bit width. The bit width of data will directly affect the memory footprint, computing power and energy consumption. Previous researches on model quantization lack effective quantitative analysis, which leads to unpredictable quantization loss of these methods. This study proposes an ultra-low loss quantization (μL2Q) method for DNN compression, which reveals the internal relationship between quantization bit width and quantization loss, effectively guiding the selection of quantization bit width and reducing quantization loss. First, the original data is mapped to the data with standard normal distribution and then the optimal parameter configuration is sought to reduce the quantization loss under the target bit width. Finally, μL2Q has been encapsulated and integrated into two popular deep learning training frameworks, including Caffe and Keras, to support the design and training of end-to-end model compression. The experimental results show that compared with the state-of-the-art three clusters of quantization solutions, μL2Q can still guarantee the accuracy and deliver 1.94%, 3.73%, and 8.24% of accuracy improvements under the typical neural networks with the same quantization bit width, respectively. In addition, it is also verified that μL2Q can be competent for more complex computer vision tasks through salient object detection experiments.

Key words:neural network compression;neural network quantization;weight distribution;uniform quantization;extremum of quantizationloss

Reference

[1] Peng YL, Zhang L, Zhang Y, Liu SG, Guo M. Deep deconvolution neural network for image super-resolution. Ruan Jian Xue Bao/Journal of Software, 2018,29(4):926-934(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5407.htm[doi:10. 13328/j.cnki.jos.005407]

[2] Ge DH, Li HS, Zhang L, Liu RY, Shen PY, Miao QG. Survey of lightweight neural network. Ruan Jian Xue Bao/Journal of Software, 2020,31(9):2627-2653(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5942.htm[doi:10.13328/j. cnki.jos.005942]

[3] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In:Bengio Y, Le Cun Y, eds. Proc. of the ICLR. San Diego, 2015.[doi:10.13328/j.cnki.jos.005428]

[4] He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In:Proc. of the CVPR. Las Vegas:IEEE Computer Society, 2016. 770-778.[doi:10.1109/CVPR.2016.90]

[5] Fan DP, Wang W, Cheng MM, Shen J. Shifting more attention to video salient object detection. In:Proc. of the CVPR. Long Beach:Computer Vision Foundation IEEE, 2019. 8554-8564.[doi:10.1109/CVPR.2019.00875]

[6] Girshick RB. Fast R-CNN. In:Proc. of the ICCV. Santiago:IEEE Computer Society, 2015. 1440-1448.[doi:10.1109/ICCV.2015. 169]

[7] Liu W, Anguelov D, Erhan D, Szegedy C, Reed SE, Fu CY, Berg AC. SSD:Single shot MultiBox detector. In:Proc. of the ECCV. Cham:Springer-Verlag, 2016. 21-37.[doi:10.1007/978-3-319-46448-0_2]

[8] Szegedy C, Ioffe S, Vanhoucke V, Alemi AA. Inception-v4, inception-ResNet and the impact of residual connections on learning. In:Proc. of the AAAI. San Francisco:AAAI Press, 2017. 4278-4284.

[9] Fan D, Cheng M, Liu J, Gao S, Hou Q, Borji A. Salient objects in clutter:Bringing salient object detection to the foreground. In:Proc. of the ECCV. Munich:Springer-Verlag, 2018. 196-212.[doi:10.1007/978-3-030-01267-0_12]

[10] Noh H, Hong S, Han B. Learning deconvolution network for semantic segmentation. In:Proc. of the ICCV. Santiago:IEEE Computer Society, 2015. 1520-1528.[doi:10.1109/ICCV.2015.178]

[11] Pohlen T, Alex, Hermans E, Mathias M, Leibe B. Full-Resolution residual networks for semantic segmentation in street scenes. In:Proc. of the CVPR. Honolulu:IEEE Computer Society, 2017. 3309-3318.[doi:10.1109/CVPR.2017.353]

[12] Girshick RB, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. In:Proc. of the CVPR. Columbus:IEEE Computer Society, 2014. 580-587.[doi:10.1109/CVPR.2014.81]

[13] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In:Proc. of the CVPR. Boston:IEEE Computer Society, 2015. 3431-3440.[doi:10.1109/CVPR.2015.7298965]

[14] Lei J, Gao X, Song J, Wang XL, Song ML. Survey of deep neural network model compression. Ruan Jian Xue Bao/Journal of Software, 2018,29(2):251-266(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5428.htm[doi:10.13328/j. cnki.jos.005428]

[15] Hubara I, Courbariaux M, Soudry D, El-Yaniv R, Bengio Y. Binarized neural networks. In:Lee DD, ed. Proc. of the NIPS. 2016. 4107-4115.

[16] Li F, Zhang B, Liu B. Ternary weight networks. arXiv preprint arXiv:1605.04711, 2016.

[17] Zhou S, Ni Z, Zhou X, Wen H, Wu Y, Zou Y. DoReFa-Net:Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160, 2016.

[18] Rastegari M, Ordonez V, Redmon J, Farhadi A. XNOR-Net:ImageNet classification using binary convolutional neural networks. In:Leibe B, ed. Proc. of the ECCV. Springer-Verlag, 2016. 525-542.[doi:10.1007/978-3-319-46493-0_32]

[19] Gysel P, Motamedi M, Ghiasi S. Hardware-Oriented approximation of convolutional neural networks. arXiv preprint arXiv:1604.03168, 2016.

[20] Kim M, Smaragdis P. Bitwise neural networks. arXiv preprint arXiv:1601.06071, 2016.

[21] Han S, Mao H, Dally WJ. Deep compression:Compressing deep neural network with pruning, trained quantization and huffman coding. In:Proc. of the ICLR. Puerto Rico, 2015.

[22] Jacob B, Kligys S, Chen B, Zhu M, Tang M, Howard AG, Adam H, Kalenichenko D. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In:Proc. of the CVPR. Salt Lake City:IEEE Computer Society, 2018. 2704-2713.

[23] Jain SR, Gural A, Wu M, Dick C. Trained uniform quantization for accurate and efficient neural network inference on fixed-point hardware. arXiv preprint arXiv:1903.08066, 2016.

[24] Bishop CM. Pattern Recognition and Machine Learning. Springer-Verlag, 2006.

[25] Murphy KP. Machine Learning:A Probabilistic Perspective. MIT Press, 2012.

[26] Zhu C, Han S, Mao H, Dally WJ. Trained ternary quantization. In:Proc. of the ICLR. 2017. https://openreview.net/pdf?id=S1_pA u9xl

[27] Jin C, Sun H, Kimura S. Sparse ternary connect:Convolutional neural networks using ternarized weights with enhanced sparsity. In:Shin Y, ed. Proc. of the ASP-DAC. IEEE, 2018. 190-195.[doi:10.1109/ASPDAC.2018.8297304]

[28] Lin DD, Talathi SS, Annapureddy VS. Fixed point quantization of deep convolutional networks. In:Balcan M, Weinberger KQ, eds. Proc. of the ICML. New York, 2016. 2849-2858.

[29] Polino A, Pascanu R, Alistarh D. Model compression via distillation and quantization. In:Proc. of the ICLR. 2018. https://openreview.net/pdf?id=S1XolQbRW

[30] Wang P, Hu Q, Zhang Y, Zhang C, Liu Y, Cheng J. Two-Step quantization for low-bit neural networks. In:Proc. of the CVPR. IEEE Computer Society, 2018. 4376-4384.[doi:10.1109/CVPR.2018.00460]

[31] Gong C, Li T, Lu Y, Hao C, Zhang X, Chen D, Chen Y. μL2Q:An ultra-low loss quantization method for DNN compression. In:Proc. of the IJCNN. IEEE, 2019. 1-8.[doi:10.1109/IJCNN.2019.8851699]

[32] Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick RB, Guadarrama S, Darrell T. Caffe:Convolutional architecture for fast feature embedding. In:Hua KA, ed. Proc. of the 22nd ACM Int'l Conf. on Multimedia. ACM, 2014. 675-678.[doi:10.1145/2647868.2654889]

[33] Chollet F, et al. In GitHub repository. 2015. https://github.com/keras-team/keras

[34] Le Cun Y, Bottou L, Bengio Y, Haffner P. Gradient-Based learning applied to document recognition. Proc. of the IEEE, 1998, 86(11):2278-2324.[doi:10.1109/5.726791]

[35] Krizhevsky A, Hinton G. Learning multiple layers of features from tiny images. 2009. http://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf

[36] Deng J, Dong W, Socher R, Li L, Li K, Li F. Imagenet:A large-scale hierarchical image database. In:Proc. of the CVPR. IEEE Computer Society, 2009. 248-255.[doi:10.1109/CVPR.2009.5206848]

[37] Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. Communications of the ACM, 2017,60(6):84-90.[doi:10.1145/3065386]

[38] Sandler M, Howard A, Zhu ML, Zhmoginov A, Chen LC. Mobilenetv2:Inverted residuals and linear bottlenecks. In:Proc. of the CVPR. IEEE Computer Society, 2018. 4510-4520.[doi:10.1109/CVPR.2018.00474]

[39] Ghasemzadeh M, Samragh M, Koushanfar F. ReBNet:Residual binarized neural network. In:Proc. of the FCCM. IEEE Computer Society, 2018. 57-64.[doi:10.1109/FCCM.2018.00018]

[40] Courbariaux M, Bengio Y, David JP. Binaryconnect:Training deep neural networks with binary weights during propagations. In:Proc. of the NIPS 2015. 2015. 3123-3131.

[41] Alemdar H, Leroy V, Prost-Boucle A, Petro F. Ternary neural networks for resource-efficient AI applications. In:Proc. of the IJCNN. IEEE, 2017. 2547-2554.[doi:10.1109/IJCNN.2017.7966166]

[42] Esser SK, Appuswamy R, Merolla P, Arthur JV, Modha DS. Backpropagation for energy-efficient neuromorphic computing. In:Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R, eds. Proc. of the NIPS. 2015. 1117-1125.

[43] Leng C, Dou Z, Li H, Zhu S, Jin R. Extremely low bit neural network:squeeze the last bit out with ADMM. In:McIlraith SA, Weinberger KQ, eds. Proc. of the AAAI. AAAI Press, 2018. 3466-3473.

[44] Lin ZH, Courbariaux M, Memisevic R, Bengio Y. Neural networks with few multiplications. arXiv preprint arXiv:1510.03009, 2015.

[45] Wang W, Lai Q, Fu H, Shen J, Ling H. Salient object detection in the deep learning era:An in-depth survey. arXiv preprint arXiv:1904.09146, 2016.

[46] Cheng M, Mitra NJ, Huang X, Torr PHS, Hu S. Global contrast based salient region detection. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2015,37(3):569-582.[doi:10.1109/TPAMI.2014.2345401]

[47] Yan Q, Xu L, Shi J, Jia J. Hierarchical saliency detection. In:Proc. of the CVPR. IEEE Computer Society, 2013. 1155-1162.

[48] Wang L, Lu H, Wang Y, Feng M, Wang D, Yin B, Ruan X. Learning to detect salient objects with image-level supervision. In:Proc. of the CVPR. IEEE Computer Society, 2017. 3796-3805.[doi:10.1109/CVPR.2017.404]

[49] Movahedi V, Elder JH. Design and perceptual validation of performance measures for salient object segmentation. In:Proc. of the CVPR. IEEE Computer Society, 2010. 49-56.[doi:10.1109/CVPRW.2010.5543739]

[50] Cheng M, Mitra NJ, Huang X, Hu S. Salientshape:Group saliency in image collections. The Visual Computer, 2014,30(4):443-453.

[51] Ronneberger O, Fischer P, Brox T. U-Net:Convolutional networks for biomedical image segmentation. In:Navab N, et al., eds. Proc. of the MICCAI. Cham:Springer-Verlag, 2015. 234-241.[doi:10.1007/978-3-319-24574-4_28]

[52] Chaurasia A, Culurciello E. Linknet:Exploiting encoder representations for efficient semantic segmentation. In:Proc. of the VCIP. IEEE, 2017. 1-4.[doi:10.1109/VCIP.2017.8305148]

[53] Zhou Z, Siddiquee MMR, Tajbakhsh N, Liang J. Unet++:A nested u-net architecture for medical image segmentation. In:Stoyanov D, Taylor Z, eds. Proc. of the MICCAI. Cham:Springer-Verlag, 2018. 3-11.[doi:10.1007/978-3-030-00889-5_1]

[54] Perazzi F, Krähenbühl P, Pritch Y, Alex, Hornung E. Saliency filters:Contrast based filtering for salient region detection. In:Proc. of the ICCV. IEEE, 2012. 733-740.[doi:10.1109/CVPR.2012.6247743]

[55] Achanta R, Hemami S, Estrada F, Susstrunk S. Frequency-Tuned salient region detection. In:Proc. of the CVPR. IEEE Computer Society, 2009. 1597-1604.[doi:10.1109/CVPR.2009.5206596]

[56] Fan D, Cheng M, Liu Y, Li T, Borji A. Structure-Measure:A new way to evaluate foreground maps. In:Proc. of the ICCV. IEEE Computer Society, 2017. 4558-4567.[doi:10.1109/ICCV.2017.487]

[57] Fan D, Gong C, Cao Y, Ren B, Cheng M, Borji A. Enhanced-Alignment measure for binary foreground map evaluation. In:Lang J, ed. Proc. of the PIJCAI. Stockholm, 2018. 698-704.[doi:10.24963/ijcai.2018/97]

附中文参考文献:

[1] 彭亚丽,张鲁,张钰,刘侍刚,郭敏.基于深度反卷积神经网络的图像超分辨率算法.软件学报,2018,29(4):926-934. http://www.jos.org.cn/1000-9825/5407.htm[doi:10.13328/j.cnki.jos.005407]

[2] 葛道辉,李洪升,张亮,刘如意,沈沛意,苗启广.轻量级神经网络架构综述.软件学报,2020,31(9):2627-2653. http://www.jos.org.cn/1000-9825/5942.htm[doi:10.13328/j.cnki.jos.005942]

[14] 雷杰,高鑫,宋杰,王兴路,宋明黎.深度网络模型压缩综述.软件学报,2018,29(2):251-266. http://www.jos.org.cn/1000-9825/5428.htm[doi:10.13328/j.cnki.jos.005428]

Get Citation

龚成,卢冶,代素蓉,刘方鑫,陈新伟,李涛.一种超低损失的深度神经网络量化压缩方法.软件学报,2021,32(8):2391-2407

Copy

Article Metrics

Abstract:2990
PDF: 7223
HTML: 4523
Cited by: 0

History

Received:July 21,2020
Revised:September 07,2020
Adopted:
Online: February 07,2021
Published: August 06,2021

You are the first2050453Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History