Mixed-precision Deep Neural Network Quantization With Multiple Compression Rates

Quantization of one deep neural network to multiple compression rates (precisions) has been recently considered for flexible deployments in real-world scenarios. However, existing methods for network quantization under multiple compression rates leverage fixed-precision bit-width allocation or heuristically search for mixed-precision strategy and cannot well balance efficiency and performance. In this paper, we propose a novel progressive joint training scheme that achieves progressive bit-width allocation and joint training to simultaneously optimize mixed-precision quantized networks under multiple compression rates. Specifically, we develop a progressive bit-width allocation with switchable quantization step size to enable mixed-precision quantization based on analytic sensitivity of network layers under multiple compression rates. Furthermore, we achieve joint training for quantized networks under different compression rates via knowledge distillation to exploit their correlations based on the shared network structure. Experimental results show that the proposed scheme achieves better performance than existing fixed-precision schemes in various networks on CIFAR-10 and ImageNet.

DCC_pre-WD.pptx

DCC_pre-WD.pptx (149)

Thumbs Up

CITE

Documents

Presentation Slides

Mixed-precision Deep Neural Network Quantization With Multiple Compression Rates

DCC_pre-WD.pptx

QUESTIONS?