Documents
Presentation Slides
A REGRESSION APPROACH TO BINAURAL SPEECH SEGREGATION VIA DEEP NEURAL NETWORK
- Citation Author(s):
- Submitted by:
- Nana Fan
- Last updated:
- 14 October 2016 - 11:07pm
- Document Type:
- Presentation Slides
- Document Year:
- 2016
- Event:
- Presenters:
- Nana Fan
- Paper Code:
- O4-2
- Categories:
- Keywords:
- Log in to post comments
This paper proposes a novel regression approach to binaural speech segregation based on deep neural network (DNN). In contrast to the conventional ideal binary mask (IBM) method using DNN with the interaural time difference (ITD) and interaural level difference (ILD) as the auditory features, the log-power spectra (LPS) features of target speech are directly predicted via a regression DNN model by concatenating the monaural LPS features and the binaural features as the input. As for the binaural features, the sub-band ILDs based on LPS features are designed which are verified to be more effective than the full-band ILDs. Our experiments show that our proposed approach can significantly outperform IBM-based speech segregation in terms of both objective measures of speech quality and speech intelligibility for noisy and reverberant environments.