Sorry, you need to enable JavaScript to visit this website.

Sequence Modeling in Unsupervised Single-channel Overlapped Speech Recognition

Citation Author(s):
Submitted by:
Zhehuai Chen
Last updated:
12 April 2018 - 12:40pm
Document Type:
Presentation Slides
Document Year:
2018
Event:
Presenters:
Zhehuai Chen
Paper Code:
SP-L3.1
 

Unsupervised single-channel overlapped speech recognition is one
of the hardest problems in automatic speech recognition (ASR). The
problems can be modularized into three sub-problems: frame-wise
interpreting, sequence level speaker tracing and speech recognition.
Nevertheless, previous acoustic models formulate the correlation between sequential labels implicitly, which limit the modeling effect.
In this work, we include explicit models for the sequential label
correlation during training. This is relevant to models given by both
the feature sequence and the output of the last frame. Moreover,
we propose to integrate the linguistic information into the assignment decision of the permutation invariant training (PIT). Namely, a
senone level neural network language model (NNLM) trained in the
clean speech alignment is integrated, while the objective function
is still cross-entropy. The proposed methods can be combined with
an improved version of PIT and sequence discriminative training,
which brings about further over 10% relative improvement of WER
in the artificial overlapped Switchboard and hub5e-swb dataset.

up
0 users have voted: