Sorry, you need to enable JavaScript to visit this website.

A REGRESSION APPROACH TO BINAURAL SPEECH SEGREGATION VIA DEEP NEURAL NETWORK

Citation Author(s):
Nana Fan, Jun Du, Lirong Dai
Submitted by:
Nana Fan
Last updated:
14 October 2016 - 11:07pm
Document Type:
Presentation Slides
Document Year:
2016
Event:
Presenters:
Nana Fan
Paper Code:
O4-2
 

This paper proposes a novel regression approach to binaural speech segregation based on deep neural network (DNN). In contrast to the conventional ideal binary mask (IBM) method using DNN with the interaural time difference (ITD) and interaural level difference (ILD) as the auditory features, the log-power spectra (LPS) features of target speech are directly predicted via a regression DNN model by concatenating the monaural LPS features and the binaural features as the input. As for the binaural features, the sub-band ILDs based on LPS features are designed which are verified to be more effective than the full-band ILDs. Our experiments show that our proposed approach can significantly outperform IBM-based speech segregation in terms of both objective measures of speech quality and speech intelligibility for noisy and reverberant environments.

up
0 users have voted: