Sequence Modeling in Unsupervised Single-channel Overlapped Speech Recognition

Unsupervised single-channel overlapped speech recognition is one
of the hardest problems in automatic speech recognition (ASR). The
problems can be modularized into three sub-problems: frame-wise
interpreting, sequence level speaker tracing and speech recognition.
Nevertheless, previous acoustic models formulate the correlation between sequential labels implicitly, which limit the modeling effect.
In this work, we include explicit models for the sequential label
correlation during training. This is relevant to models given by both
the feature sequence and the output of the last frame. Moreover,
we propose to integrate the linguistic information into the assignment decision of the permutation invariant training (PIT). Namely, a
senone level neural network language model (NNLM) trained in the
clean speech alignment is integrated, while the objective function
is still cross-entropy. The proposed methods can be combined with
an improved version of PIT and sequence discriminative training,
which brings about further over 10% relative improvement of WER
in the artificial overlapped Switchboard and hub5e-swb dataset.

cocktail icassp2018 oral slides_zhc00.pdf

cocktail icassp2018 oral slides_zhc00.pdf (388)

Thumbs Up

CITE

Documents

Presentation Slides

Sequence Modeling in Unsupervised Single-channel Overlapped Speech Recognition

cocktail icassp2018 oral slides_zhc00.pdf

QUESTIONS?