Documents
Poster
CBLDNN-BASED SPEAKER-INDEPENDENT SPEECH SEPARATION VIA GENERATIVE ADVERSARIAL TRAINING
- Citation Author(s):
- Submitted by:
- Chenxing Li
- Last updated:
- 22 April 2018 - 9:43pm
- Document Type:
- Poster
- Document Year:
- 2018
- Event:
- Presenters:
- Chenxing Li
- Paper Code:
- AASP-P11.7
- Categories:
- Log in to post comments
In this paper, we propose a speaker-independent multi-speaker monaural speech separation system (CBLDNN-GAT) based on convolutional, bidirectional long short-term memory, deep feed-forward neural network (CBLDNN) with generative adversarial training (GAT). Our system aims at obtaining better speech quality instead of only minimizing a mean square error (MSE). In the initial phase, we utilize log-mel filterbank and pitch features to warm up our CBLDNN in a multi-task manner. Thus, the information that contributes to separating speech and improving speech quality is integrated into the model. We execute GAT throughout the training, which makes the separated speech indistinguishable from the real one. We evaluate CBLDNN-GAT on WSJ0-2mix dataset. The experimental results show that the proposed model achieves 11.0dB signal-to-distortion ratio (SDR) improvement, which is the new state-of-the-art result.