Statistical Normalisation of Phase-based Feature Representation for Robust Speech Recognition

In earlier work we have proposed a source-filter decomposition of
speech through phase-based processing. The decomposition leads
to novel speech features that are extracted from the filter component
of the phase spectrum. This paper analyses this spectrum and the
proposed representation by evaluating statistical properties at vari-
ous points along the parametrisation pipeline. We show that speech
phase spectrum has a bell-shaped distribution which is in contrast to
the uniform assumption that is usually made. It is demonstrated that
the uniform density (which implies that the corresponding sequence
is least-informative) is an artefact of the phase wrapping and not an
original characteristic of this spectrum. In addition, we extend the
idea of statistical normalisation usually applied for the magnitude-
based features into the phase domain. Based on the statistical struc-
ture of the phase-based features, which is shown to be super-gaussian
in the clean condition, three normalisation schemes, namely, Gaus-
sianisation, Laplacianisation and table-based histogram equalisation
have been applied for improving the robustness. Speech recognition
experiments using Aurora-2 show that applying an optimal normali-
sation scheme at the right stage of the feature extraction process can
produce average relative WER reductions of up to 18.6% across the
0-20 dB SNR conditions

ICASSP2017_0.pdf

ICASSP2017_0.pdf (546)

Thumbs Up

CITE

Documents

Poster

Statistical Normalisation of Phase-based Feature Representation for Robust Speech Recognition

ICASSP2017_0.pdf

QUESTIONS?