Documents
Presentation Slides
Mixed-precision Deep Neural Network Quantization With Multiple Compression Rates
- Citation Author(s):
- Submitted by:
- Xuanda Wang
- Last updated:
- 28 February 2023 - 10:01pm
- Document Type:
- Presentation Slides
- Document Year:
- 2023
- Event:
- Presenters:
- Xuanda Wang
- Paper Code:
- DCC-217
- Categories:
- Keywords:
- Log in to post comments
Quantization of one deep neural network to multiple compression rates (precisions) has been recently considered for flexible deployments in real-world scenarios. However, existing methods for network quantization under multiple compression rates leverage fixed-precision bit-width allocation or heuristically search for mixed-precision strategy and cannot well balance efficiency and performance. In this paper, we propose a novel progressive joint training scheme that achieves progressive bit-width allocation and joint training to simultaneously optimize mixed-precision quantized networks under multiple compression rates. Specifically, we develop a progressive bit-width allocation with switchable quantization step size to enable mixed-precision quantization based on analytic sensitivity of network layers under multiple compression rates. Furthermore, we achieve joint training for quantized networks under different compression rates via knowledge distillation to exploit their correlations based on the shared network structure. Experimental results show that the proposed scheme achieves better performance than existing fixed-precision schemes in various networks on CIFAR-10 and ImageNet.