Sorry, you need to enable JavaScript to visit this website.

CBLDNN-BASED SPEAKER-INDEPENDENT SPEECH SEPARATION VIA GENERATIVE ADVERSARIAL TRAINING

Citation Author(s):
Chenxing Li, Lei Zhu, Shuang Xu, Peng Gao, Bo Xu
Submitted by:
Chenxing Li
Last updated:
22 April 2018 - 9:43pm
Document Type:
Poster
Document Year:
2018
Event:
Presenters:
Chenxing Li
Paper Code:
AASP-P11.7
 

In this paper, we propose a speaker-independent multi-speaker monaural speech separation system (CBLDNN-GAT) based on convolutional, bidirectional long short-term memory, deep feed-forward neural network (CBLDNN) with generative adversarial training (GAT). Our system aims at obtaining better speech quality instead of only minimizing a mean square error (MSE). In the initial phase, we utilize log-mel filterbank and pitch features to warm up our CBLDNN in a multi-task manner. Thus, the information that contributes to separating speech and improving speech quality is integrated into the model. We execute GAT throughout the training, which makes the separated speech indistinguishable from the real one. We evaluate CBLDNN-GAT on WSJ0-2mix dataset. The experimental results show that the proposed model achieves 11.0dB signal-to-distortion ratio (SDR) improvement, which is the new state-of-the-art result.

up
0 users have voted: