AN EFFICIENT METHOD FOR GENERIC DSP IMPLEMENTATION OF DILATED CONVOLUTION

Dilated convolution is a well-known technique used in neural networks algorithms in AI/ML applications to increase receptive-field under analysis. Dilated convolution layer has an inherent property of capturing wider context in an image and long-term temporal characteristics in an audio signal. In this paper we propose a scheme that allows efficient/generic implementation of 2D dilated convolution and stride on typical DSPs where the instruction sets are well tuned for standard 1D and 2D filtering and convolution operations. The paper analyzes and morphs the basic structures of dilated convolution computations using a decomposition method similar to polyphase decomposition to map it to a friendly and efficient standard convolution operation. The method also naturally extends to include stride as a feature for dilated convolution. Using this scheme, we publish results on Tensilica’s HiFi5 platform achieving a computational cycle reduction in the order of 30X and memory reduction in the order of 150X against a standard implementation for dilation. We have also made the code available as a part of Cadence’s NN Library git hub code base on HiFi5 processor.