Sorry, you need to enable JavaScript to visit this website.

facebooktwittermailshare

SINGLE CHANNEL SPEECH SEPARATION WITH CONSTRAINED UTTERANCE LEVEL PERMUTATION INVARIANT TRAINING USING GRID LSTM

Abstract: 

Utterance level permutation invariant training (uPIT) tech- nique is a state-of-the-art deep learning architecture for speaker independent multi-talker separation. uPIT solves the label ambiguity problem by minimizing the mean square error (MSE) over all permutations between outputs and tar- gets. However, uPIT may be sub-optimal at segmental level because the optimization is not calculated over the individual frames. In this paper, we propose a constrained uPIT (cu- PIT) to solve this problem by computing a weighted MSE loss using dynamic information (i.e., delta and acceleration). The weighted loss ensures the temporal continuity of output frames with the same speaker. Inspired by the heuristics (i.e., vocal tract continuity) in computational auditory scene analy- sis, we then extend the model by adding a Grid LSTM layer, that we name it as cuPIT-Grid LSTM, to automatically learn both temporal and spectral patterns over the input magnitude spectrum simultaneously. The experimental results show 9.6% and 8.5% relative improvements on WSJ0-2mix dataset under both closed and open conditions comparing with the uPIT baseline.

up
0 users have voted:

Paper Details

Authors:
CHENGLIN XU, WEI RAO, XIONG XIAO, ENG SIONG CHNG, HAIZHOU LI
Submitted On:
20 April 2018 - 12:38am
Short Link:
Type:
Presentation Slides
Event:
Presenter's Name:
CHENGLIN XU
Paper Code:
AASP-L1.2
Document Year:
2018
Cite

Document Files

ICASSP2018_1844.pdf

(168 downloads)

Subscribe

[1] CHENGLIN XU, WEI RAO, XIONG XIAO, ENG SIONG CHNG, HAIZHOU LI, "SINGLE CHANNEL SPEECH SEPARATION WITH CONSTRAINED UTTERANCE LEVEL PERMUTATION INVARIANT TRAINING USING GRID LSTM", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/3068. Accessed: Aug. 18, 2018.
@article{3068-18,
url = {http://sigport.org/3068},
author = {CHENGLIN XU; WEI RAO; XIONG XIAO; ENG SIONG CHNG; HAIZHOU LI },
publisher = {IEEE SigPort},
title = {SINGLE CHANNEL SPEECH SEPARATION WITH CONSTRAINED UTTERANCE LEVEL PERMUTATION INVARIANT TRAINING USING GRID LSTM},
year = {2018} }
TY - EJOUR
T1 - SINGLE CHANNEL SPEECH SEPARATION WITH CONSTRAINED UTTERANCE LEVEL PERMUTATION INVARIANT TRAINING USING GRID LSTM
AU - CHENGLIN XU; WEI RAO; XIONG XIAO; ENG SIONG CHNG; HAIZHOU LI
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/3068
ER -
CHENGLIN XU, WEI RAO, XIONG XIAO, ENG SIONG CHNG, HAIZHOU LI. (2018). SINGLE CHANNEL SPEECH SEPARATION WITH CONSTRAINED UTTERANCE LEVEL PERMUTATION INVARIANT TRAINING USING GRID LSTM. IEEE SigPort. http://sigport.org/3068
CHENGLIN XU, WEI RAO, XIONG XIAO, ENG SIONG CHNG, HAIZHOU LI, 2018. SINGLE CHANNEL SPEECH SEPARATION WITH CONSTRAINED UTTERANCE LEVEL PERMUTATION INVARIANT TRAINING USING GRID LSTM. Available at: http://sigport.org/3068.
CHENGLIN XU, WEI RAO, XIONG XIAO, ENG SIONG CHNG, HAIZHOU LI. (2018). "SINGLE CHANNEL SPEECH SEPARATION WITH CONSTRAINED UTTERANCE LEVEL PERMUTATION INVARIANT TRAINING USING GRID LSTM." Web.
1. CHENGLIN XU, WEI RAO, XIONG XIAO, ENG SIONG CHNG, HAIZHOU LI. SINGLE CHANNEL SPEECH SEPARATION WITH CONSTRAINED UTTERANCE LEVEL PERMUTATION INVARIANT TRAINING USING GRID LSTM [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/3068