Sorry, you need to enable JavaScript to visit this website.

General Topics in Speech Recognition (SPE-GASR)

Detecting Mismatch between Text Script and Voice-over Using Utterance Verification Based on Phoneme Recognition Ranking


The purpose of this study is to detect the mismatch between text script and voice-over. For this, we present a novel utterance verification (UV) method, which calculates the degree of correspondence between a voice-over and the phoneme sequence of a script. We found that the phoneme recognition probabilities of exaggerated voice-overs decrease compared to ordinary utterances, but their rankings do not demonstrate any significant change.

Paper Details

Authors:
Yoonjae Jeong, Hoon-Young Cho
Submitted On:
21 May 2020 - 7:57am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

ICASSP2020_YJEONG_SLIDES.pdf

(5)

Keywords

Additional Categories

Subscribe

[1] Yoonjae Jeong, Hoon-Young Cho, "Detecting Mismatch between Text Script and Voice-over Using Utterance Verification Based on Phoneme Recognition Ranking", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5426. Accessed: Jun. 07, 2020.
@article{5426-20,
url = {http://sigport.org/5426},
author = {Yoonjae Jeong; Hoon-Young Cho },
publisher = {IEEE SigPort},
title = {Detecting Mismatch between Text Script and Voice-over Using Utterance Verification Based on Phoneme Recognition Ranking},
year = {2020} }
TY - EJOUR
T1 - Detecting Mismatch between Text Script and Voice-over Using Utterance Verification Based on Phoneme Recognition Ranking
AU - Yoonjae Jeong; Hoon-Young Cho
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5426
ER -
Yoonjae Jeong, Hoon-Young Cho. (2020). Detecting Mismatch between Text Script and Voice-over Using Utterance Verification Based on Phoneme Recognition Ranking. IEEE SigPort. http://sigport.org/5426
Yoonjae Jeong, Hoon-Young Cho, 2020. Detecting Mismatch between Text Script and Voice-over Using Utterance Verification Based on Phoneme Recognition Ranking. Available at: http://sigport.org/5426.
Yoonjae Jeong, Hoon-Young Cho. (2020). "Detecting Mismatch between Text Script and Voice-over Using Utterance Verification Based on Phoneme Recognition Ranking." Web.
1. Yoonjae Jeong, Hoon-Young Cho. Detecting Mismatch between Text Script and Voice-over Using Utterance Verification Based on Phoneme Recognition Ranking [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5426

Synchronous Transformers for End-to-End Speech Recognition

Paper Details

Authors:
Zhengkun Tian, Jiangyan Yi, Ye Bai, Jianhua Tao, Shuai Zhang, Zhengqi Wen
Submitted On:
17 May 2020 - 3:20am
Short Link:
Type:
Event:
Document Year:
Cite

Document Files

Sync-Transformer-icassp2020.pdf

(15)

Subscribe

[1] Zhengkun Tian, Jiangyan Yi, Ye Bai, Jianhua Tao, Shuai Zhang, Zhengqi Wen, "Synchronous Transformers for End-to-End Speech Recognition", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5382. Accessed: Jun. 07, 2020.
@article{5382-20,
url = {http://sigport.org/5382},
author = {Zhengkun Tian; Jiangyan Yi; Ye Bai; Jianhua Tao; Shuai Zhang; Zhengqi Wen },
publisher = {IEEE SigPort},
title = {Synchronous Transformers for End-to-End Speech Recognition},
year = {2020} }
TY - EJOUR
T1 - Synchronous Transformers for End-to-End Speech Recognition
AU - Zhengkun Tian; Jiangyan Yi; Ye Bai; Jianhua Tao; Shuai Zhang; Zhengqi Wen
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5382
ER -
Zhengkun Tian, Jiangyan Yi, Ye Bai, Jianhua Tao, Shuai Zhang, Zhengqi Wen. (2020). Synchronous Transformers for End-to-End Speech Recognition. IEEE SigPort. http://sigport.org/5382
Zhengkun Tian, Jiangyan Yi, Ye Bai, Jianhua Tao, Shuai Zhang, Zhengqi Wen, 2020. Synchronous Transformers for End-to-End Speech Recognition. Available at: http://sigport.org/5382.
Zhengkun Tian, Jiangyan Yi, Ye Bai, Jianhua Tao, Shuai Zhang, Zhengqi Wen. (2020). "Synchronous Transformers for End-to-End Speech Recognition." Web.
1. Zhengkun Tian, Jiangyan Yi, Ye Bai, Jianhua Tao, Shuai Zhang, Zhengqi Wen. Synchronous Transformers for End-to-End Speech Recognition [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5382

Confidence Estimation for Black Box Automatic Speech Recognition Systems Using Lattice Recurrent Neural Networks


Recently, there has been growth in providers of speech transcription services enabling others to leverage technology they would not normally be able to use. As a result, speech-enabled solutions have become commonplace. Their success critically relies on the quality, accuracy, and reliability of the underlying speech transcription systems. Those black box systems, however, offer limited means for quality control as only word sequences are typically available.

Paper Details

Authors:
A. Kastanos, A. Ragni, M.J.F. Gales
Submitted On:
14 May 2020 - 4:29pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Black-Box-ASR-ICASSP-2020.pdf

(8)

Subscribe

[1] A. Kastanos, A. Ragni, M.J.F. Gales, "Confidence Estimation for Black Box Automatic Speech Recognition Systems Using Lattice Recurrent Neural Networks", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5320. Accessed: Jun. 07, 2020.
@article{5320-20,
url = {http://sigport.org/5320},
author = {A. Kastanos; A. Ragni; M.J.F. Gales },
publisher = {IEEE SigPort},
title = {Confidence Estimation for Black Box Automatic Speech Recognition Systems Using Lattice Recurrent Neural Networks},
year = {2020} }
TY - EJOUR
T1 - Confidence Estimation for Black Box Automatic Speech Recognition Systems Using Lattice Recurrent Neural Networks
AU - A. Kastanos; A. Ragni; M.J.F. Gales
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5320
ER -
A. Kastanos, A. Ragni, M.J.F. Gales. (2020). Confidence Estimation for Black Box Automatic Speech Recognition Systems Using Lattice Recurrent Neural Networks. IEEE SigPort. http://sigport.org/5320
A. Kastanos, A. Ragni, M.J.F. Gales, 2020. Confidence Estimation for Black Box Automatic Speech Recognition Systems Using Lattice Recurrent Neural Networks. Available at: http://sigport.org/5320.
A. Kastanos, A. Ragni, M.J.F. Gales. (2020). "Confidence Estimation for Black Box Automatic Speech Recognition Systems Using Lattice Recurrent Neural Networks." Web.
1. A. Kastanos, A. Ragni, M.J.F. Gales. Confidence Estimation for Black Box Automatic Speech Recognition Systems Using Lattice Recurrent Neural Networks [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5320

A DATASET FOR MEASURING READING LEVELS IN INDIA AT SCALE


One out of four children in India are leaving grade eight without basic reading skills. Measuring the reading levels in a vast country like India poses significant hurdles. Recent advances in machine learning opens up the possibility of automating this task. However, the datasets of children’s speech are not only rare but are primarily in English. To solve this assessment problem and advance deep learning research in regional Indian languages, we present the ASER dataset of children in the age group of 6-14.

Paper Details

Authors:
Jayant Gupchup, Nishant Baghel
Submitted On:
14 May 2020 - 3:28am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

ICASSP 2020 PPT.pdf

(7)

Subscribe

[1] Jayant Gupchup, Nishant Baghel, "A DATASET FOR MEASURING READING LEVELS IN INDIA AT SCALE", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5249. Accessed: Jun. 07, 2020.
@article{5249-20,
url = {http://sigport.org/5249},
author = {Jayant Gupchup; Nishant Baghel },
publisher = {IEEE SigPort},
title = {A DATASET FOR MEASURING READING LEVELS IN INDIA AT SCALE},
year = {2020} }
TY - EJOUR
T1 - A DATASET FOR MEASURING READING LEVELS IN INDIA AT SCALE
AU - Jayant Gupchup; Nishant Baghel
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5249
ER -
Jayant Gupchup, Nishant Baghel. (2020). A DATASET FOR MEASURING READING LEVELS IN INDIA AT SCALE. IEEE SigPort. http://sigport.org/5249
Jayant Gupchup, Nishant Baghel, 2020. A DATASET FOR MEASURING READING LEVELS IN INDIA AT SCALE. Available at: http://sigport.org/5249.
Jayant Gupchup, Nishant Baghel. (2020). "A DATASET FOR MEASURING READING LEVELS IN INDIA AT SCALE." Web.
1. Jayant Gupchup, Nishant Baghel. A DATASET FOR MEASURING READING LEVELS IN INDIA AT SCALE [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5249

Exploring Pre-training with Alignments for RNN Transducer based End-to-End Speech Recognition

Paper Details

Authors:
Hu Hu, Rui Zhao, Jinyu li, Liang Lu, Yifan Gong
Submitted On:
13 May 2020 - 8:47pm
Short Link:
Type:
Document Year:
Cite

Document Files

rnnt_icassp2020_.pptx

(9)

Keywords

Additional Categories

Subscribe

[1] Hu Hu, Rui Zhao, Jinyu li, Liang Lu, Yifan Gong, "Exploring Pre-training with Alignments for RNN Transducer based End-to-End Speech Recognition", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5173. Accessed: Jun. 07, 2020.
@article{5173-20,
url = {http://sigport.org/5173},
author = {Hu Hu; Rui Zhao; Jinyu li; Liang Lu; Yifan Gong },
publisher = {IEEE SigPort},
title = {Exploring Pre-training with Alignments for RNN Transducer based End-to-End Speech Recognition},
year = {2020} }
TY - EJOUR
T1 - Exploring Pre-training with Alignments for RNN Transducer based End-to-End Speech Recognition
AU - Hu Hu; Rui Zhao; Jinyu li; Liang Lu; Yifan Gong
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5173
ER -
Hu Hu, Rui Zhao, Jinyu li, Liang Lu, Yifan Gong. (2020). Exploring Pre-training with Alignments for RNN Transducer based End-to-End Speech Recognition. IEEE SigPort. http://sigport.org/5173
Hu Hu, Rui Zhao, Jinyu li, Liang Lu, Yifan Gong, 2020. Exploring Pre-training with Alignments for RNN Transducer based End-to-End Speech Recognition. Available at: http://sigport.org/5173.
Hu Hu, Rui Zhao, Jinyu li, Liang Lu, Yifan Gong. (2020). "Exploring Pre-training with Alignments for RNN Transducer based End-to-End Speech Recognition." Web.
1. Hu Hu, Rui Zhao, Jinyu li, Liang Lu, Yifan Gong. Exploring Pre-training with Alignments for RNN Transducer based End-to-End Speech Recognition [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5173

Motion Dynamics Improve Speaker-Independent Lipreading


We present a novel lipreading system that improves on the task of speaker-independent word recognition by decoupling motion and content dynamics. We achieve this by implementing a deep learning architecture that uses two distinct pipelines to process motion and content and subsequently merges them, implementing an end-to-end trainable system that performs fusion of independently learned representations. We obtain a average relative word accuracy improvement of ≈6.8% on unseen speakers and of ≈3.3% on known speakers, with respect to a baseline which uses a standard architecture.

Paper Details

Authors:
Matteo Riva, Michael Wand, Jürgen Schmidhuber
Submitted On:
19 April 2020 - 6:19pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Presentation PDF slides

(44)

Subscribe

[1] Matteo Riva, Michael Wand, Jürgen Schmidhuber, "Motion Dynamics Improve Speaker-Independent Lipreading", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5108. Accessed: Jun. 07, 2020.
@article{5108-20,
url = {http://sigport.org/5108},
author = {Matteo Riva; Michael Wand; Jürgen Schmidhuber },
publisher = {IEEE SigPort},
title = {Motion Dynamics Improve Speaker-Independent Lipreading},
year = {2020} }
TY - EJOUR
T1 - Motion Dynamics Improve Speaker-Independent Lipreading
AU - Matteo Riva; Michael Wand; Jürgen Schmidhuber
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5108
ER -
Matteo Riva, Michael Wand, Jürgen Schmidhuber. (2020). Motion Dynamics Improve Speaker-Independent Lipreading. IEEE SigPort. http://sigport.org/5108
Matteo Riva, Michael Wand, Jürgen Schmidhuber, 2020. Motion Dynamics Improve Speaker-Independent Lipreading. Available at: http://sigport.org/5108.
Matteo Riva, Michael Wand, Jürgen Schmidhuber. (2020). "Motion Dynamics Improve Speaker-Independent Lipreading." Web.
1. Matteo Riva, Michael Wand, Jürgen Schmidhuber. Motion Dynamics Improve Speaker-Independent Lipreading [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5108

Adaptation of an EMG-Based Speech Recognizer via Meta-Learning


In nonacoustic speech recognition based on electromyography, i.e. on electrical muscle activity captured by noninvasive surface electrodes, differences between recording sessions are known to cause deteriorating system accuracy. Efficient adaptation of an existing system to an unseen recording session is therefore imperative for practical usage scenarios. We report on a meta-learning approach to pretrain a deep neural network frontend for a myoelectric speech recognizer in a way that it can be easily adapted to a new session.

Paper Details

Authors:
Krsto Proroković, Michael Wand, Tanja Schultz, Jürgen Schmidhuber
Submitted On:
6 December 2019 - 2:25pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Adaptation of an EMG-Based Speech Recognizer via Meta-Learning.pdf

(105)

Subscribe

[1] Krsto Proroković, Michael Wand, Tanja Schultz, Jürgen Schmidhuber, "Adaptation of an EMG-Based Speech Recognizer via Meta-Learning", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4954. Accessed: Jun. 07, 2020.
@article{4954-19,
url = {http://sigport.org/4954},
author = {Krsto Proroković; Michael Wand; Tanja Schultz; Jürgen Schmidhuber },
publisher = {IEEE SigPort},
title = {Adaptation of an EMG-Based Speech Recognizer via Meta-Learning},
year = {2019} }
TY - EJOUR
T1 - Adaptation of an EMG-Based Speech Recognizer via Meta-Learning
AU - Krsto Proroković; Michael Wand; Tanja Schultz; Jürgen Schmidhuber
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4954
ER -
Krsto Proroković, Michael Wand, Tanja Schultz, Jürgen Schmidhuber. (2019). Adaptation of an EMG-Based Speech Recognizer via Meta-Learning. IEEE SigPort. http://sigport.org/4954
Krsto Proroković, Michael Wand, Tanja Schultz, Jürgen Schmidhuber, 2019. Adaptation of an EMG-Based Speech Recognizer via Meta-Learning. Available at: http://sigport.org/4954.
Krsto Proroković, Michael Wand, Tanja Schultz, Jürgen Schmidhuber. (2019). "Adaptation of an EMG-Based Speech Recognizer via Meta-Learning." Web.
1. Krsto Proroković, Michael Wand, Tanja Schultz, Jürgen Schmidhuber. Adaptation of an EMG-Based Speech Recognizer via Meta-Learning [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4954

END-TO-END FEEDBACK LOSS IN SPEECH CHAIN FRAMEWORK VIA STRAIGHT-THROUGH ESTIMATOR


The speech chain mechanism integrates automatic speech recognition (ASR) and text-to-speech synthesis (TTS) modules into a single cycle during training. In our previous work, we applied a speech chain mechanism as a semi-supervised learning. It provides the ability for ASR and TTS to assist each other when they receive unpaired data and let them infer the missing pair and optimize the model with reconstruction loss.

Paper Details

Authors:
Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
Submitted On:
14 May 2019 - 8:26pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

ICASSP19_Poster_V1.pdf

(100)

Subscribe

[1] Andros Tjandra, Sakriani Sakti, Satoshi Nakamura, "END-TO-END FEEDBACK LOSS IN SPEECH CHAIN FRAMEWORK VIA STRAIGHT-THROUGH ESTIMATOR", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4519. Accessed: Jun. 07, 2020.
@article{4519-19,
url = {http://sigport.org/4519},
author = {Andros Tjandra; Sakriani Sakti; Satoshi Nakamura },
publisher = {IEEE SigPort},
title = {END-TO-END FEEDBACK LOSS IN SPEECH CHAIN FRAMEWORK VIA STRAIGHT-THROUGH ESTIMATOR},
year = {2019} }
TY - EJOUR
T1 - END-TO-END FEEDBACK LOSS IN SPEECH CHAIN FRAMEWORK VIA STRAIGHT-THROUGH ESTIMATOR
AU - Andros Tjandra; Sakriani Sakti; Satoshi Nakamura
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4519
ER -
Andros Tjandra, Sakriani Sakti, Satoshi Nakamura. (2019). END-TO-END FEEDBACK LOSS IN SPEECH CHAIN FRAMEWORK VIA STRAIGHT-THROUGH ESTIMATOR. IEEE SigPort. http://sigport.org/4519
Andros Tjandra, Sakriani Sakti, Satoshi Nakamura, 2019. END-TO-END FEEDBACK LOSS IN SPEECH CHAIN FRAMEWORK VIA STRAIGHT-THROUGH ESTIMATOR. Available at: http://sigport.org/4519.
Andros Tjandra, Sakriani Sakti, Satoshi Nakamura. (2019). "END-TO-END FEEDBACK LOSS IN SPEECH CHAIN FRAMEWORK VIA STRAIGHT-THROUGH ESTIMATOR." Web.
1. Andros Tjandra, Sakriani Sakti, Satoshi Nakamura. END-TO-END FEEDBACK LOSS IN SPEECH CHAIN FRAMEWORK VIA STRAIGHT-THROUGH ESTIMATOR [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4519

Subword Regularization and Beam Search Decoding for End-to-End Automatic Speech Recognition


In this paper, we experiment with the recently introduced subword regularization technique \cite{kudo2018subword} in the context of end-to-end automatic speech recognition (ASR). We present results from both attention-based and CTC-based ASR systems on two common benchmark datasets, the 80 hour Wall Street Journal corpus and 1,000 hour Librispeech corpus. We also introduce a novel subword beam search decoding algorithm that significantly improves the final performance of the CTC-based systems.

Paper Details

Authors:
Jennifer Drexler, James Glass
Submitted On:
14 May 2019 - 9:04am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

ICASSP_poster_final.pdf

(121)

Subscribe

[1] Jennifer Drexler, James Glass, "Subword Regularization and Beam Search Decoding for End-to-End Automatic Speech Recognition", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4509. Accessed: Jun. 07, 2020.
@article{4509-19,
url = {http://sigport.org/4509},
author = {Jennifer Drexler; James Glass },
publisher = {IEEE SigPort},
title = {Subword Regularization and Beam Search Decoding for End-to-End Automatic Speech Recognition},
year = {2019} }
TY - EJOUR
T1 - Subword Regularization and Beam Search Decoding for End-to-End Automatic Speech Recognition
AU - Jennifer Drexler; James Glass
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4509
ER -
Jennifer Drexler, James Glass. (2019). Subword Regularization and Beam Search Decoding for End-to-End Automatic Speech Recognition. IEEE SigPort. http://sigport.org/4509
Jennifer Drexler, James Glass, 2019. Subword Regularization and Beam Search Decoding for End-to-End Automatic Speech Recognition. Available at: http://sigport.org/4509.
Jennifer Drexler, James Glass. (2019). "Subword Regularization and Beam Search Decoding for End-to-End Automatic Speech Recognition." Web.
1. Jennifer Drexler, James Glass. Subword Regularization and Beam Search Decoding for End-to-End Automatic Speech Recognition [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4509

ACOUSTICALLY GROUNDED WORD EMBEDDINGS FOR IMPROVED ACOUSTICS-TO-WORD SPEECH RECOGNITION

Paper Details

Authors:
Karen Livescu, Michael Picheny
Submitted On:
14 May 2019 - 7:08am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

icassp_official_final.pdf

(120)

Subscribe

[1] Karen Livescu, Michael Picheny, "ACOUSTICALLY GROUNDED WORD EMBEDDINGS FOR IMPROVED ACOUSTICS-TO-WORD SPEECH RECOGNITION", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4506. Accessed: Jun. 07, 2020.
@article{4506-19,
url = {http://sigport.org/4506},
author = {Karen Livescu; Michael Picheny },
publisher = {IEEE SigPort},
title = {ACOUSTICALLY GROUNDED WORD EMBEDDINGS FOR IMPROVED ACOUSTICS-TO-WORD SPEECH RECOGNITION},
year = {2019} }
TY - EJOUR
T1 - ACOUSTICALLY GROUNDED WORD EMBEDDINGS FOR IMPROVED ACOUSTICS-TO-WORD SPEECH RECOGNITION
AU - Karen Livescu; Michael Picheny
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4506
ER -
Karen Livescu, Michael Picheny. (2019). ACOUSTICALLY GROUNDED WORD EMBEDDINGS FOR IMPROVED ACOUSTICS-TO-WORD SPEECH RECOGNITION. IEEE SigPort. http://sigport.org/4506
Karen Livescu, Michael Picheny, 2019. ACOUSTICALLY GROUNDED WORD EMBEDDINGS FOR IMPROVED ACOUSTICS-TO-WORD SPEECH RECOGNITION. Available at: http://sigport.org/4506.
Karen Livescu, Michael Picheny. (2019). "ACOUSTICALLY GROUNDED WORD EMBEDDINGS FOR IMPROVED ACOUSTICS-TO-WORD SPEECH RECOGNITION." Web.
1. Karen Livescu, Michael Picheny. ACOUSTICALLY GROUNDED WORD EMBEDDINGS FOR IMPROVED ACOUSTICS-TO-WORD SPEECH RECOGNITION [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4506

Pages