Documents
Presentation Slides
DNN-based Multi-Channel Speech Coding Employing Sound Localization
- Citation Author(s):
- Submitted by:
- Changchun Bao
- Last updated:
- 20 February 2022 - 9:11pm
- Document Type:
- Presentation Slides
- Document Year:
- 2022
- Event:
- Presenters:
- Shuhao Deng
- Paper Code:
- 106
- Categories:
- Log in to post comments
In this paper, a novel multi-channel speech coding method based on deep neural networks (DNN) employing sound localization called time difference of arrival (TDOA) is proposed. At the encoder, only the speech signals of two reference channels and estimated TDOAs are coded. At the decoder, a well-trained DNN that builds the relationship of the amplitude spectra between two reference channels and other channels is embedded into the decoder for recovering the amplitude spectra of other channel signals from two decoded reference signals. Specially, the speech signals of two reference channels are encoded by the existing codec. The proposed multi-channel speech coding method has two advantages. Firstly, it can effectively reduce bit rates with an acceptable speech distortion. Secondly, it can keep the backward compatibility with the existing codec. Compared with some existing multi-channel speech coding approaches, the experimental results show that the proposed method has higher coding performance at lower bit rates.