Ultra-low Loss Quantization Method for Deep Neural Network Compression
Author:
Affiliation:

Clc Number:

TP181

Fund Project:

National Key Research and Development Program of China (2018YFB2100300); National Natural Science Foundation of China (62002175, 61872200); Natural Science Foundation of Tianjin Municipality (19JCZDJC31600, 19JCQNJC00600); Open Fund of State Key Laboratory of Computer Architecture (Institute of Computing Technology, Chinese Academy of Sciences) (CARCHB202016, CARCH201905); Innovation Fund of Chinese Universities Industry-University-Research (2020HYA01003); Open Fund of Industrial Robot Application of Fujian University Engineering Research Center (Minjiang University) (MJUKF-IRA1902)

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Deep neural network (DNN) quantization is an efficient model compression method, in which parameters and intermediate results are expressed by low bit width. The bit width of data will directly affect the memory footprint, computing power and energy consumption. Previous researches on model quantization lack effective quantitative analysis, which leads to unpredictable quantization loss of these methods. This study proposes an ultra-low loss quantization (μL2Q) method for DNN compression, which reveals the internal relationship between quantization bit width and quantization loss, effectively guiding the selection of quantization bit width and reducing quantization loss. First, the original data is mapped to the data with standard normal distribution and then the optimal parameter configuration is sought to reduce the quantization loss under the target bit width. Finally, μL2Q has been encapsulated and integrated into two popular deep learning training frameworks, including Caffe and Keras, to support the design and training of end-to-end model compression. The experimental results show that compared with the state-of-the-art three clusters of quantization solutions, μL2Q can still guarantee the accuracy and deliver 1.94%, 3.73%, and 8.24% of accuracy improvements under the typical neural networks with the same quantization bit width, respectively. In addition, it is also verified that μL2Q can be competent for more complex computer vision tasks through salient object detection experiments.

    Reference
    Related
    Cited by
Get Citation

龚成,卢冶,代素蓉,刘方鑫,陈新伟,李涛.一种超低损失的深度神经网络量化压缩方法.软件学报,2021,32(8):2391-2407

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:July 21,2020
  • Revised:September 07,2020
  • Adopted:
  • Online: February 07,2021
  • Published: August 06,2021
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063