Sorry, you need to enable JavaScript to visit this website.

Rate-distortion Optimized Coding for Efficient CNN Compression

Citation Author(s):
Wang Zhe, Jie Lin, Mohamed Sabry Aly, Sean Young, Vijay Chandrasekhar, Bernd Girod
Submitted by:
Zhe Wang
Last updated:
3 March 2021 - 11:30pm
Document Type:
Presentation Slides
Document Year:
2021
Event:
Presenters:
Wang Zhe
Categories:
Keywords:
 

In this paper, we present a coding framework for deep convolutional neural network compression. Our approach utilizes the classical coding theories and formulates the compression of deep convolutional neural networks as a rate-distortion optimization problem. We incorporate three coding ingredients in the coding framework, including bit allocation, dead zone quantization, and Tunstall coding, to improve the rate-distortion frontier without noticeable system-level overhead introduced. Experimental results show that our approach achieves state-of-the-art results on various deep convolutional neural networks and obtains considerable speedup on two deep learning accelerators. Specifically, our approach achieves 20× compression ratio on ResNet-18, ResNet-34, and ResNet-50, and 10× compression ratio on the compact already model MobileNet-v2, without hurting the accuracy. We then examine the system level impact of our approach when deploying the compressed models to hardware platforms. Hardware simulation results show that our approach obtains up to 4.3× and 2.8× inference speedup on state-of-the-art deep learning accelerators TPU and Eyeriss, respectively.

up
0 users have voted: