Robust Speech Recognition (SPE-ROBU)

ON DNN POSTERIOR PROBABILITY COMBINATION IN MULTI-STREAM SPEECH RECOGNITION FOR REVERBERANT ENVIRONMENTS

A multi-stream framework with deep neural network (DNN) classifiers has been applied in this paper to improve automatic speech recognition (ASR) performance in environments with different reverberation characteristics. We propose a room parameter estimation model to determine the stream weights for DNN posterior probability combination with the aim of obtaining reliable log-likelihoods for decoding. The model is implemented by training a multi-layer

poster_icassp17_xiongetal.pdf

poster_icassp17_xiongetal.pdf (640)

Categories:: Robust Speech Recognition (SPE-ROBU)

6 Views

Statistical Normalisation of Phase-based Feature Representation for Robust Speech Recognition

In earlier work we have proposed a source-filter decomposition of
speech through phase-based processing. The decomposition leads
to novel speech features that are extracted from the filter component
of the phase spectrum. This paper analyses this spectrum and the
proposed representation by evaluating statistical properties at vari-
ous points along the parametrisation pipeline. We show that speech
phase spectrum has a bell-shaped distribution which is in contrast to
the uniform assumption that is usually made. It is demonstrated that

ICASSP2017_0.pdf

ICASSP2017_0.pdf (546)

Categories:: Robust Speech Recognition (SPE-ROBU)
Robust Speech Recognition (SPE-ROBU)

8 Views

A Speaker-Dependent Deep Learning Approach to Joint Speech Separation and Acoustic Modeling for Multi-Talker Automatic Speech Recognition

We propose a novel speaker-dependent (SD) approach to joint training of deep neural networks (DNNs) with an explicit speech separation structure for multi-talker speech recognition in a single-channel setting. First, a multi-condition training strategy is designed for a SD-DNN recognizer in multi-talker scenarios, which can significantly reduce the decoding runtime and improve the recognition accuracy over the approaches that use speaker-independent DNN models with a complicated joint decoding framework.

Yanhui_ISCSLP2016_oral.pdf

Yanhui_ISCSLP2016_oral.pdf (654)

Categories:: Robust Speech Recognition (SPE-ROBU)

16 Views

Vector Taylor Series Expansion with Auditory Masking for Noise Robust Speech Recognition

In this paper, we address the problem of speech recognition in
the presence of additive noise. We investigate the applicability
and efficacy of auditory masking in devising a robust front end
for noisy features. This is achieved by introducing a masking
factor into the Vector Taylor Series (VTS) equations. The resultant
first order VTS approximation is used to compensate the parameters
of a clean speech model and a Minimum Mean Square
Error (MMSE) estimate is used to estimate the clean speech

Paper17_BD.pdf

Paper17_BD.pdf (302)

Categories:: Robust Speech Recognition (SPE-ROBU)

3 Views

Vector Taylor Series Expansion with Auditory Masking for Noise Robust Speech Recognition

Paper17_BD.pdf

Paper17_BD.pdf (683)

Categories:: Robust Speech Recognition (SPE-ROBU)

4 Views

Employing Median Filtering to Enhance the Complex-valued Acoustic Spectrograms in Modulation Domain for Noise-robust Speech Recognition

ISCSLP_2016.pdf

ISCSLP_2016.pdf (669)

Categories:: Robust Speech Recognition (SPE-ROBU)

3 Views

Two-Stage Noise Aware Training Using Asymmetric Deep Denoising Autoencoder

Read more about Two-Stage Noise Aware Training Using Asymmetric Deep Denoising Autoencoder
Log in to post comments

Ever since the deep neural network (DNN)-based acoustic model appeared, the recognition performance of automatic peech recognition has been greatly improved. Due to this achievement, various researches on DNN-based technique for noise robustness are also in progress. Among these approaches, the noise-aware training (NAT) technique which aims to improve the inherent robustness of DNN using noise estimates has shown remarkable performance. However, despite the great performance, we cannot be certain whether NAT is an optimal method for sufficiently utilizing the inherent robustness of DNN.

ICASSP2016_포스터_이강현_그래프2.pdf

ICASSP2016_포스터_이강현_그래프2.pdf (69)

Categories:: Robust Speech Recognition (SPE-ROBU)
Acoustic Modeling for Automatic Speech Recognition (SPE-RECO)

28 Views

SPEECH EMOTION RECOGNITION USING TRANSFER NON-NEGATIVE MATRIX FACTORIZATION

Read more about SPEECH EMOTION RECOGNITION USING TRANSFER NON-NEGATIVE MATRIX FACTORIZATION
Log in to post comments

In practical situations, the emotional speech utterances are often collected from different devices and conditions, which will obviously affect the recognition performance. To address this issue, in this paper, a novel transfer non-negative matrix factorization (TNMF) method is presented for cross-corpus speech emotion recognition. First, the NMF algorithm is adopted to learn a latent common feature space for the source and target datasets.

SRC_TNMF_PengSong.pdf

SRC_TNMF_PengSong.pdf (734)

Categories:: Robust Speech Recognition (SPE-ROBU)

28 Views

Robust Speech Recognition (SPE-ROBU)

Pages