- Read more about A Causal Deep Learning Framework for Classifying Phonemes in Cochlear Implants
- Log in to post comments
- Categories:
- Read more about The Use of Voice Source Features for Sung Speech Recognition
- Log in to post comments
In this paper, we ask whether vocal source features (pitch, shimmer, jitter, etc) can improve the performance of automatic sung
3256.pdf
3256 (1).pdf
- Categories:
- Read more about Contrastive Unsupervised Learning for Speech Emotion Recognition
- 1 comment
- Log in to post comments
slides.pdf
- Categories:
- Read more about Focus on the present: a regularization method for the ASR source-target attention layer
- Log in to post comments
- Categories:
Hypothesis-level combination between multiple models can often yield gains in speech recognition. However, all models in the ensemble are usually restricted to use the same audio segmentation times. This paper proposes to generalise hypothesis-level combination, allowing the use of different audio segmentation times between the models, by splitting and re-joining the hypothesised N-best lists in time. A hypothesis tree method is also proposed to distribute hypothesis posteriors among the constituent words, to facilitate such splitting when per-word scores are not available.
- Categories:
- Read more about Detecting Mismatch between Text Script and Voice-over Using Utterance Verification Based on Phoneme Recognition Ranking
- Log in to post comments
The purpose of this study is to detect the mismatch between text script and voice-over. For this, we present a novel utterance verification (UV) method, which calculates the degree of correspondence between a voice-over and the phoneme sequence of a script. We found that the phoneme recognition probabilities of exaggerated voice-overs decrease compared to ordinary utterances, but their rankings do not demonstrate any significant change.
- Categories:
- Categories:
- Read more about Confidence Estimation for Black Box Automatic Speech Recognition Systems Using Lattice Recurrent Neural Networks
- Log in to post comments
Recently, there has been growth in providers of speech transcription services enabling others to leverage technology they would not normally be able to use. As a result, speech-enabled solutions have become commonplace. Their success critically relies on the quality, accuracy, and reliability of the underlying speech transcription systems. Those black box systems, however, offer limited means for quality control as only word sequences are typically available.
- Categories:
One out of four children in India are leaving grade eight without basic reading skills. Measuring the reading levels in a vast country like India poses significant hurdles. Recent advances in machine learning opens up the possibility of automating this task. However, the datasets of children’s speech are not only rare but are primarily in English. To solve this assessment problem and advance deep learning research in regional Indian languages, we present the ASER dataset of children in the age group of 6-14.
- Categories:
- Read more about Exploring Pre-training with Alignments for RNN Transducer based End-to-End Speech Recognition
- Log in to post comments
- Categories: