CBLDNN-BASED SPEAKER-INDEPENDENT SPEECH SEPARATION VIA GENERATIVE ADVERSARIAL TRAINING

In this paper, we propose a speaker-independent multi-speaker monaural speech separation system (CBLDNN-GAT) based on convolutional, bidirectional long short-term memory, deep feed-forward neural network (CBLDNN) with generative adversarial training (GAT). Our system aims at obtaining better speech quality instead of only minimizing a mean square error (MSE). In the initial phase, we utilize log-mel filterbank and pitch features to warm up our CBLDNN in a multi-task manner. Thus, the information that contributes to separating speech and improving speech quality is integrated into the model. We execute GAT throughout the training, which makes the separated speech indistinguishable from the real one. We evaluate CBLDNN-GAT on WSJ0-2mix dataset. The experimental results show that the proposed model achieves 11.0dB signal-to-distortion ratio (SDR) improvement, which is the new state-of-the-art result.

conference_poster_4.pdf

conference_poster_4.pdf (584)

Thumbs Up

CITE

Documents

Poster

CBLDNN-BASED SPEAKER-INDEPENDENT SPEECH SEPARATION VIA GENERATIVE ADVERSARIAL TRAINING

conference_poster_4.pdf

QUESTIONS?