General Topics in Speech Recognition (SPE-GASR)

A Causal Deep Learning Framework for Classifying Phonemes in Cochlear Implants

Read more about A Causal Deep Learning Framework for Classifying Phonemes in Cochlear Implants
Log in to post comments

Chu_ICASSP2021_Poster_v2.pdf

ICASSP poster (270)

Categories:: General Topics in Speech Recognition (SPE-GASR)

37 Views

The Use of Voice Source Features for Sung Speech Recognition

Read more about The Use of Voice Source Features for Sung Speech Recognition
Log in to post comments

In this paper, we ask whether vocal source features (pitch, shimmer, jitter, etc) can improve the performance of automatic sung

3256 (1).pdf

Poster (209)

Categories:: General Topics in Speech Recognition (SPE-GASR)

11 Views

Contrastive Unsupervised Learning for Speech Emotion Recognition

Read more about Contrastive Unsupervised Learning for Speech Emotion Recognition
1 comment
Log in to post comments

slides.pdf

slides.pdf (194)

Categories:: General Topics in Speech Recognition (SPE-GASR)

11 Views

Focus on the present: a regularization method for the ASR source-target attention layer

ICASSP21_slides.pptx

ICASSP21_slides.pptx (202)

Categories:: General Topics in Speech Recognition (SPE-GASR)

1 Views

Ensemble combination between different time segmentations

Read more about Ensemble combination between different time segmentations
Log in to post comments

Hypothesis-level combination between multiple models can often yield gains in speech recognition. However, all models in the ensemble are usually restricted to use the same audio segmentation times. This paper proposes to generalise hypothesis-level combination, allowing the use of different audio segmentation times between the models, by splitting and re-joining the hypothesised N-best lists in time. A hypothesis tree method is also proposed to distribute hypothesis posteriors among the constituent words, to facilitate such splitting when per-word scores are not available.

ICASSP_2021___Multi_pass_combination___poster.pdf

ICASSP_2021___Multi_pass_combination___poster.pdf (196)

Categories:: General Topics in Speech Recognition (SPE-GASR)

7 Views

Detecting Mismatch between Text Script and Voice-over Using Utterance Verification Based on Phoneme Recognition Ranking

The purpose of this study is to detect the mismatch between text script and voice-over. For this, we present a novel utterance verification (UV) method, which calculates the degree of correspondence between a voice-over and the phoneme sequence of a script. We found that the phoneme recognition probabilities of exaggerated voice-overs decrease compared to ordinary utterances, but their rankings do not demonstrate any significant change.

ICASSP2020_YJEONG_SLIDES.pdf

ICASSP2020_YJEONG_SLIDES.pdf (336)

Categories:: General Topics in Speech Recognition (SPE-GASR)

24 Views

Synchronous Transformers for End-to-End Speech Recognition

Read more about Synchronous Transformers for End-to-End Speech Recognition
Log in to post comments

Sync-Transformer-icassp2020.pdf

Sync-Transformer-icassp2020.pdf (395)

Categories:: General Topics in Speech Recognition (SPE-GASR)

49 Views

Confidence Estimation for Black Box Automatic Speech Recognition Systems Using Lattice Recurrent Neural Networks

Recently, there has been growth in providers of speech transcription services enabling others to leverage technology they would not normally be able to use. As a result, speech-enabled solutions have become commonplace. Their success critically relies on the quality, accuracy, and reliability of the underlying speech transcription systems. Those black box systems, however, offer limited means for quality control as only word sequences are typically available.

Black-Box-ASR-ICASSP-2020.pdf

Black-Box-ASR-ICASSP-2020.pdf (315)

Categories:: General Topics in Speech Recognition (SPE-GASR)

15 Views

A DATASET FOR MEASURING READING LEVELS IN INDIA AT SCALE

Read more about A DATASET FOR MEASURING READING LEVELS IN INDIA AT SCALE
Log in to post comments

One out of four children in India are leaving grade eight without basic reading skills. Measuring the reading levels in a vast country like India poses significant hurdles. Recent advances in machine learning opens up the possibility of automating this task. However, the datasets of children’s speech are not only rare but are primarily in English. To solve this assessment problem and advance deep learning research in regional Indian languages, we present the ASER dataset of children in the age group of 6-14.