Documents
Presentation Slides
Sequence Modeling in Unsupervised Single-channel Overlapped Speech Recognition
- Citation Author(s):
- Submitted by:
- Zhehuai Chen
- Last updated:
- 12 April 2018 - 12:40pm
- Document Type:
- Presentation Slides
- Document Year:
- 2018
- Event:
- Presenters:
- Zhehuai Chen
- Paper Code:
- SP-L3.1
- Categories:
- Log in to post comments
Unsupervised single-channel overlapped speech recognition is one
of the hardest problems in automatic speech recognition (ASR). The
problems can be modularized into three sub-problems: frame-wise
interpreting, sequence level speaker tracing and speech recognition.
Nevertheless, previous acoustic models formulate the correlation between sequential labels implicitly, which limit the modeling effect.
In this work, we include explicit models for the sequential label
correlation during training. This is relevant to models given by both
the feature sequence and the output of the last frame. Moreover,
we propose to integrate the linguistic information into the assignment decision of the permutation invariant training (PIT). Namely, a
senone level neural network language model (NNLM) trained in the
clean speech alignment is integrated, while the objective function
is still cross-entropy. The proposed methods can be combined with
an improved version of PIT and sequence discriminative training,
which brings about further over 10% relative improvement of WER
in the artificial overlapped Switchboard and hub5e-swb dataset.