DNN-based Multi-Channel Speech Coding Employing Sound Localization

In this paper, a novel multi-channel speech coding method based on deep neural networks (DNN) employing sound localization called time difference of arrival (TDOA) is proposed. At the encoder, only the speech signals of two reference channels and estimated TDOAs are coded. At the decoder, a well-trained DNN that builds the relationship of the amplitude spectra between two reference channels and other channels is embedded into the decoder for recovering the amplitude spectra of other channel signals from two decoded reference signals. Specially, the speech signals of two reference channels are encoded by the existing codec. The proposed multi-channel speech coding method has two advantages. Firstly, it can effectively reduce bit rates with an acceptable speech distortion. Secondly, it can keep the backward compatibility with the existing codec. Compared with some existing multi-channel speech coding approaches, the experimental results show that the proposed method has higher coding performance at lower bit rates.

DCC2022-106.pptx

presentation slides (337)

Thumbs Up

CITE

Documents

Presentation Slides

DNN-based Multi-Channel Speech Coding Employing Sound Localization

DCC2022-106.pptx

QUESTIONS?