Sorry, you need to enable JavaScript to visit this website.

Speech enhancement with neural homomorphic synthesis

Citation Author(s):
Wenbin Jiang, Zhijun Liu, Kai Yu, Fei Wen
Submitted by:
wenbin jiang
Last updated:
7 May 2022 - 12:04am
Document Type:
Poster
Document Year:
2022
Event:
Presenters:
Wenbin Jiang
Paper Code:
AUD-11.3
 

Most deep learning-based speech enhancement methods operate directly on time-frequency representations or learned features without making use of the model of speech production. This work proposes a new speech enhancement method based on neural homomorphic synthesis. The speech signal is firstly decomposed into excitation and vocal tract with complex cepstrum analysis. Then, two complex-valued neural networks are applied to estimate the target complex spectrum of the decomposed components. Finally, the time-domain speech signal is synthesized from the estimated excitation and vocal tract. Furthermore, we investigated numerous loss functions and found that the multi-resolution STFT loss, commonly used in the TTS vocoder, benefits speech enhancement. Experimental results demonstrate that the proposed method outperforms existing state-of-the-art complex-valued neural network-based methods in terms of both PESQ and eSTOI.

up
0 users have voted: