A REGRESSION APPROACH TO BINAURAL SPEECH SEGREGATION VIA DEEP NEURAL NETWORK

This paper proposes a novel regression approach to binaural speech segregation based on deep neural network (DNN). In contrast to the conventional ideal binary mask (IBM) method using DNN with the interaural time difference (ITD) and interaural level difference (ILD) as the auditory features, the log-power spectra (LPS) features of target speech are directly predicted via a regression DNN model by concatenating the monaural LPS features and the binaural features as the input. As for the binaural features, the sub-band ILDs based on LPS features are designed which are verified to be more effective than the full-band ILDs. Our experiments show that our proposed approach can significantly outperform IBM-based speech segregation in terms of both objective measures of speech quality and speech intelligibility for noisy and reverberant environments.

oral-presentation3.pptx

oral-presentation3.pptx (841)

oral-presentation3.pptx

oral-presentation3.pptx (691)

Thumbs Up

CITE

Documents

Presentation Slides

A REGRESSION APPROACH TO BINAURAL SPEECH SEGREGATION VIA DEEP NEURAL NETWORK

oral-presentation3.pptx

oral-presentation3.pptx

QUESTIONS?