Robust Speech Recognition (SPE-ROBU)

An Analysis of Speech Enhancement and Recognition Losses in Limited Resources Multi-Talker Single Channel Audio-Visual ASR

Read more about An Analysis of Speech Enhancement and Recognition Losses in Limited Resources Multi-Talker Single Channel Audio-Visual ASR
Log in to post comments

In this paper, we analyzed how audio-visual speech enhancement can help to perform the ASR task in a cocktail party scenario. Therefore we considered two simple end-to-end LSTM-based models that perform single-channel audiovisual speech enhancement and phone recognition respectively. Then, we studied how the two models interact, and how to train them jointly affects the final result.We analyzed different training strategies that reveal some interesting and unexpected behaviors.

slides_paper#3109.pdf

slides_paper#3109.pdf (458)

Categories:: Robust Speech Recognition (SPE-ROBU)
Speech Enhancement (SPE-ENHA)

53 Views

Small energy masking for improved neural network training for end-to-end speech recognition

In this paper, we present a Small Energy Masking (SEM) algorithm, which masks inputs having values below a certain threshold. More specifically, a time-frequency bin is masked if the filterbank energy in this bin is less than a certain energy threshold. A uniform distribution is employed to randomly generate the ratio of this energy threshold to the peak filterbank energy of each utterance in decibels. The unmasked feature elements are scaled so that the total sum of the feature values remain the same through this masking procedure.

20200508_icassp_small_energy_masking_paper_3965_presentation.pdf

20200508_icassp_small_energy_masking_paper_3965_presentation.pdf (474)

Categories:: Acoustic Modeling for Automatic Speech Recognition (SPE-RECO)
Robust Speech Recognition (SPE-ROBU)

33 Views

Learning Discriminative Features in Sequence Training without Requiring Framewise Labelled Data

Slides.pdf

Slides.pdf (642)

Categories:: Robust Speech Recognition (SPE-ROBU)

34 Views

Conditional Teacher-Student Learning

Read more about Conditional Teacher-Student Learning
Log in to post comments

The teacher-student (T/S) learning has been shown to be effective for a variety of problems such as domain adaptation and model compression. One shortcoming of the T/S learning is that a teacher model, not always perfect, sporadically produces wrong guidance in form of posterior probabilities that misleads the student model towards a suboptimal performance.

cts_poster.pptx

cts_poster.pptx (498)

Categories:: Speech Processing
Acoustic Modeling for Automatic Speech Recognition (SPE-RECO)
Robust Speech Recognition (SPE-ROBU)
Machine Learning for Signal Processing
Audio and Acoustic Signal Processing

50 Views

On reducing the effect of speaker overlap for CHiME-5

Read more about On reducing the effect of speaker overlap for CHiME-5
Log in to post comments

The CHiME-5 speech separation and recognition challenge was recently shown to pose a difficult task for the current automatic speech recognition systems.
Speaker overlap was one of the main difficulties of the challenge. The presence of noise, reverberation and the moving speakers have made the traditional source separation methods ineffective in improving the recognition accuracy.
In this paper we have explored several enhancement strategies aimed to reduce the effect of speaker overlap for CHiME-5 without performing source separation.

poster_icassp2019.pdf

poster presentation (519)

Categories:: Robust Speech Recognition (SPE-ROBU)

31 Views

Analyzing Uncertainties in Speech Recognition Using Dropout

Read more about Analyzing Uncertainties in Speech Recognition Using Dropout
Log in to post comments

The performance of Automatic Speech Recognition (ASR) systems is often measured using Word Error Rates (WER) which requires time-consuming and expensive manually transcribed data. In this paper, we use state-of-the-art ASR systems based on Deep Neural Networks (DNN) and propose a novel framework which uses ``Dropout'' at the test time to model uncertainty in prediction hypotheses. We systematically exploit this uncertainty to estimate WER without the need for explicit transcriptions.

Poster_avyas_ICASSP_2019.pdf

Poster_avyas_ICASSP_2019.pdf (425)

Categories:: Robust Speech Recognition (SPE-ROBU)

55 Views

MULTI-GEOMETRY SPATIAL ACOUSTIC MODELING FOR DISTANT SPEECH RECOGNITION

Read more about MULTI-GEOMETRY SPATIAL ACOUSTIC MODELING FOR DISTANT SPEECH RECOGNITION
Log in to post comments

The use of spatial information with multiple microphones can improve far-field automatic speech recognition (ASR) accuracy. However, conventional microphone array techniques degrade speech enhancement performance when there is an array geometry mismatch between design and test conditions. Moreover, such speech enhancement techniques do not always yield ASR accuracy improvement due to the difference between speech enhancement and ASR optimization objectives.

kumatani_poster_icassp2019b.pdf

poster file (435)

template.pdf

manuscript file (432)

Categories:: Spatial and Multichannel Audio
Acoustic Modeling for Automatic Speech Recognition (SPE-RECO)
Robust Speech Recognition (SPE-ROBU)

16 Views

FREQUENCY DOMAIN MULTI-CHANNEL ACOUSTIC MODELING FOR DISTANT SPEECH RECOGNITION

Read more about FREQUENCY DOMAIN MULTI-CHANNEL ACOUSTIC MODELING FOR DISTANT SPEECH RECOGNITION
Log in to post comments

Conventional far-field automatic speech recognition (ASR) systems typically employ microphone array techniques for speech enhancement in order to improve robustness against noise or reverberation. However, such speech enhancement techniques do not always yield ASR accuracy improvement because the optimization criterion for speech enhancement is not directly relevant to the ASR objective. In this work, we develop new acoustic modeling techniques that optimize spatial filtering and long short-term memory (LSTM) layers from multi-channel (MC) input based on an ASR criterion directly.

kumatani_poster_icassp2019a.pdf

poster file (707)

template.pdf

manuscript file (472)

Categories:: Spatial and Multichannel Audio
Robust Speech Recognition (SPE-ROBU)
Acoustic Modeling for Automatic Speech Recognition (SPE-RECO)

24 Views

REINFORCEMENT LEARNING BASED SPEECH ENHANCEMENT FOR ROBUST SPEECH RECOGNITION

Read more about REINFORCEMENT LEARNING BASED SPEECH ENHANCEMENT FOR ROBUST SPEECH RECOGNITION
Log in to post comments

REINFORCEMENT LEARNING BASED SPEECH ENHANCEMENT FOR ROBUST SPEECH RECOGNITION.pdf

REINFORCEMENT LEARNING BASED SPEECH ENHANCEMENT FOR ROBUST SPEECH RECOGNITION.pdf (396)

Categories:: Robust Speech Recognition (SPE-ROBU)

63 Views

Subband Temporal Envelope Features and Data Augmentation for End-to-end Recognition of Distant Conversational Speech

This paper investigates the use of subband temporal envelope (STE) features and speed perturbation based data augmentation in end-to-end recognition of distant conversational speech in everyday home environments. STE features track energy peaks in perceptual frequency bands which reflect the resonant properties of the vocal tract. Data augmentation is performed by adding more training data obtained after modifying the speed of the original training data.

Poster_icassp2019_CTDO.pdf

Poster_icassp2019_CTDO.pdf (395)

Categories:: Robust Speech Recognition (SPE-ROBU)

49 Views

Robust Speech Recognition (SPE-ROBU)

Pages