Rate-distortion Optimized Coding for Efficient CNN Compression

In this paper, we present a coding framework for deep convolutional neural network compression. Our approach utilizes the classical coding theories and formulates the compression of deep convolutional neural networks as a rate-distortion optimization problem. We incorporate three coding ingredients in the coding framework, including bit allocation, dead zone quantization, and Tunstall coding, to improve the rate-distortion frontier without noticeable system-level overhead introduced. Experimental results show that our approach achieves state-of-the-art results on various deep convolutional neural networks and obtains considerable speedup on two deep learning accelerators. Speciﬁcally, our approach achieves 20× compression ratio on ResNet-18, ResNet-34, and ResNet-50, and 10× compression ratio on the compact already model MobileNet-v2, without hurting the accuracy. We then examine the system level impact of our approach when deploying the compressed models to hardware platforms. Hardware simulation results show that our approach obtains up to 4.3× and 2.8× inference speedup on state-of-the-art deep learning accelerators TPU and Eyeriss, respectively.

DCC_2021_CNN_Compression_upload_slides.pdf

DCC_2021_CNN_Compression_upload_slides.pdf (354)

Thumbs Up

CITE

Documents

Presentation Slides

Rate-distortion Optimized Coding for Efficient CNN Compression

DCC_2021_CNN_Compression_upload_slides.pdf

QUESTIONS?