Sorry, you need to enable JavaScript to visit this website.

Mixed-precision Deep Neural Network Quantization With Multiple Compression Rates

Citation Author(s):
Xuanda Wang, Wen Fei, Wenrui Dai, Chenglin Li, Junni Zou, and Hongkai Xiong
Submitted by:
Xuanda Wang
Last updated:
28 February 2023 - 10:01pm
Document Type:
Presentation Slides
Document Year:
2023
Event:
Presenters:
Xuanda Wang
Paper Code:
DCC-217
 

Quantization of one deep neural network to multiple compression rates (precisions) has been recently considered for flexible deployments in real-world scenarios. However, existing methods for network quantization under multiple compression rates leverage fixed-precision bit-width allocation or heuristically search for mixed-precision strategy and cannot well balance efficiency and performance. In this paper, we propose a novel progressive joint training scheme that achieves progressive bit-width allocation and joint training to simultaneously optimize mixed-precision quantized networks under multiple compression rates. Specifically, we develop a progressive bit-width allocation with switchable quantization step size to enable mixed-precision quantization based on analytic sensitivity of network layers under multiple compression rates. Furthermore, we achieve joint training for quantized networks under different compression rates via knowledge distillation to exploit their correlations based on the shared network structure. Experimental results show that the proposed scheme achieves better performance than existing fixed-precision schemes in various networks on CIFAR-10 and ImageNet.

up
0 users have voted: