Sorry, you need to enable JavaScript to visit this website.

Resource constrained speech recognition (SPE-RCSR)

Libri-Light: A Benchmark for ASR with Limited or No Supervision- ICASSP 2020 Slides

Paper Details

Authors:
Morgan Rivière, Weiyi Zheng, Evgeny Kharitonov, Qiantong Xu, Pierre-Emmanuel Mazaré, Julien Karadayi, Vitaliy Liptchinsky, Ronan Collobert, Christian Fuegen, Tatiana Likhomanenko, Gabriel Synnaeve, Armand Joulin, Abdelrahman Mohamed, Emmanuel Dupoux
Submitted On:
6 June 2020 - 10:30pm
Short Link:
Type:
Event:
Presenter's Name:
Document Year:
Cite

Document Files

Libri-Light - A Benchmark for ASR with Limited or No Supervision -- ICASSP 2020.pdf

(33)

Subscribe

[1] Morgan Rivière, Weiyi Zheng, Evgeny Kharitonov, Qiantong Xu, Pierre-Emmanuel Mazaré, Julien Karadayi, Vitaliy Liptchinsky, Ronan Collobert, Christian Fuegen, Tatiana Likhomanenko, Gabriel Synnaeve, Armand Joulin, Abdelrahman Mohamed, Emmanuel Dupoux, "Libri-Light: A Benchmark for ASR with Limited or No Supervision- ICASSP 2020 Slides", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5458. Accessed: Aug. 12, 2020.
@article{5458-20,
url = {http://sigport.org/5458},
author = {Morgan Rivière; Weiyi Zheng; Evgeny Kharitonov; Qiantong Xu; Pierre-Emmanuel Mazaré; Julien Karadayi; Vitaliy Liptchinsky; Ronan Collobert; Christian Fuegen; Tatiana Likhomanenko; Gabriel Synnaeve; Armand Joulin; Abdelrahman Mohamed; Emmanuel Dupoux },
publisher = {IEEE SigPort},
title = {Libri-Light: A Benchmark for ASR with Limited or No Supervision- ICASSP 2020 Slides},
year = {2020} }
TY - EJOUR
T1 - Libri-Light: A Benchmark for ASR with Limited or No Supervision- ICASSP 2020 Slides
AU - Morgan Rivière; Weiyi Zheng; Evgeny Kharitonov; Qiantong Xu; Pierre-Emmanuel Mazaré; Julien Karadayi; Vitaliy Liptchinsky; Ronan Collobert; Christian Fuegen; Tatiana Likhomanenko; Gabriel Synnaeve; Armand Joulin; Abdelrahman Mohamed; Emmanuel Dupoux
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5458
ER -
Morgan Rivière, Weiyi Zheng, Evgeny Kharitonov, Qiantong Xu, Pierre-Emmanuel Mazaré, Julien Karadayi, Vitaliy Liptchinsky, Ronan Collobert, Christian Fuegen, Tatiana Likhomanenko, Gabriel Synnaeve, Armand Joulin, Abdelrahman Mohamed, Emmanuel Dupoux. (2020). Libri-Light: A Benchmark for ASR with Limited or No Supervision- ICASSP 2020 Slides. IEEE SigPort. http://sigport.org/5458
Morgan Rivière, Weiyi Zheng, Evgeny Kharitonov, Qiantong Xu, Pierre-Emmanuel Mazaré, Julien Karadayi, Vitaliy Liptchinsky, Ronan Collobert, Christian Fuegen, Tatiana Likhomanenko, Gabriel Synnaeve, Armand Joulin, Abdelrahman Mohamed, Emmanuel Dupoux, 2020. Libri-Light: A Benchmark for ASR with Limited or No Supervision- ICASSP 2020 Slides. Available at: http://sigport.org/5458.
Morgan Rivière, Weiyi Zheng, Evgeny Kharitonov, Qiantong Xu, Pierre-Emmanuel Mazaré, Julien Karadayi, Vitaliy Liptchinsky, Ronan Collobert, Christian Fuegen, Tatiana Likhomanenko, Gabriel Synnaeve, Armand Joulin, Abdelrahman Mohamed, Emmanuel Dupoux. (2020). "Libri-Light: A Benchmark for ASR with Limited or No Supervision- ICASSP 2020 Slides." Web.
1. Morgan Rivière, Weiyi Zheng, Evgeny Kharitonov, Qiantong Xu, Pierre-Emmanuel Mazaré, Julien Karadayi, Vitaliy Liptchinsky, Ronan Collobert, Christian Fuegen, Tatiana Likhomanenko, Gabriel Synnaeve, Armand Joulin, Abdelrahman Mohamed, Emmanuel Dupoux. Libri-Light: A Benchmark for ASR with Limited or No Supervision- ICASSP 2020 Slides [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5458

SPEECH RECOGNITION MODEL COMPRESSION


Deep Neural Network-based speech recognition systems are widely used in most speech processing applications. To achieve better model robustness and accuracy, these networks are constructed with millions of parameters, making them storage and compute-intensive. In this paper, we propose Bin & Quant (B&Q), a compression technique using which we were able to reduce the Deep Speech 2 speech recognition model size by 7 times for a negligible loss in accuracy.

Paper Details

Authors:
Ahmed Tewfik, Raj Pawate
Submitted On:
25 May 2020 - 2:17pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

ICASSP 2020 slides.pptx

(28)

Subscribe

[1] Ahmed Tewfik, Raj Pawate, "SPEECH RECOGNITION MODEL COMPRESSION", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5434. Accessed: Aug. 12, 2020.
@article{5434-20,
url = {http://sigport.org/5434},
author = {Ahmed Tewfik; Raj Pawate },
publisher = {IEEE SigPort},
title = {SPEECH RECOGNITION MODEL COMPRESSION},
year = {2020} }
TY - EJOUR
T1 - SPEECH RECOGNITION MODEL COMPRESSION
AU - Ahmed Tewfik; Raj Pawate
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5434
ER -
Ahmed Tewfik, Raj Pawate. (2020). SPEECH RECOGNITION MODEL COMPRESSION. IEEE SigPort. http://sigport.org/5434
Ahmed Tewfik, Raj Pawate, 2020. SPEECH RECOGNITION MODEL COMPRESSION. Available at: http://sigport.org/5434.
Ahmed Tewfik, Raj Pawate. (2020). "SPEECH RECOGNITION MODEL COMPRESSION." Web.
1. Ahmed Tewfik, Raj Pawate. SPEECH RECOGNITION MODEL COMPRESSION [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5434

CROSS LINGUAL TRANSFER LEARNING FOR ZERO-RESOURCE DOMAIN ADAPTATION


We propose a method for zero-resource domain adaptation of DNN acoustic models, for use in low-resource situations where the only in-language training data available may be poorly matched to the intended target domain. Our method uses a multi-lingual model in which several DNN layers are shared between languages. This architecture enables domain adaptation transforms learned for one well-resourced language to be applied to an entirely different low- resource language.

Paper Details

Authors:
Alberto Abad, Peter Bell, Andrea Carmantini, Steve Renals
Submitted On:
22 May 2020 - 8:32am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

ICASSP20_slides.pdf

(26)

Subscribe

[1] Alberto Abad, Peter Bell, Andrea Carmantini, Steve Renals, "CROSS LINGUAL TRANSFER LEARNING FOR ZERO-RESOURCE DOMAIN ADAPTATION", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5432. Accessed: Aug. 12, 2020.
@article{5432-20,
url = {http://sigport.org/5432},
author = {Alberto Abad; Peter Bell; Andrea Carmantini; Steve Renals },
publisher = {IEEE SigPort},
title = {CROSS LINGUAL TRANSFER LEARNING FOR ZERO-RESOURCE DOMAIN ADAPTATION},
year = {2020} }
TY - EJOUR
T1 - CROSS LINGUAL TRANSFER LEARNING FOR ZERO-RESOURCE DOMAIN ADAPTATION
AU - Alberto Abad; Peter Bell; Andrea Carmantini; Steve Renals
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5432
ER -
Alberto Abad, Peter Bell, Andrea Carmantini, Steve Renals. (2020). CROSS LINGUAL TRANSFER LEARNING FOR ZERO-RESOURCE DOMAIN ADAPTATION. IEEE SigPort. http://sigport.org/5432
Alberto Abad, Peter Bell, Andrea Carmantini, Steve Renals, 2020. CROSS LINGUAL TRANSFER LEARNING FOR ZERO-RESOURCE DOMAIN ADAPTATION. Available at: http://sigport.org/5432.
Alberto Abad, Peter Bell, Andrea Carmantini, Steve Renals. (2020). "CROSS LINGUAL TRANSFER LEARNING FOR ZERO-RESOURCE DOMAIN ADAPTATION." Web.
1. Alberto Abad, Peter Bell, Andrea Carmantini, Steve Renals. CROSS LINGUAL TRANSFER LEARNING FOR ZERO-RESOURCE DOMAIN ADAPTATION [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5432

Motion Dynamics Improve Speaker-Independent Lipreading


We present a novel lipreading system that improves on the task of speaker-independent word recognition by decoupling motion and content dynamics. We achieve this by implementing a deep learning architecture that uses two distinct pipelines to process motion and content and subsequently merges them, implementing an end-to-end trainable system that performs fusion of independently learned representations. We obtain a average relative word accuracy improvement of ≈6.8% on unseen speakers and of ≈3.3% on known speakers, with respect to a baseline which uses a standard architecture.

Paper Details

Authors:
Matteo Riva, Michael Wand, Jürgen Schmidhuber
Submitted On:
19 April 2020 - 6:19pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Presentation PDF slides

(63)

Subscribe

[1] Matteo Riva, Michael Wand, Jürgen Schmidhuber, "Motion Dynamics Improve Speaker-Independent Lipreading", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5108. Accessed: Aug. 12, 2020.
@article{5108-20,
url = {http://sigport.org/5108},
author = {Matteo Riva; Michael Wand; Jürgen Schmidhuber },
publisher = {IEEE SigPort},
title = {Motion Dynamics Improve Speaker-Independent Lipreading},
year = {2020} }
TY - EJOUR
T1 - Motion Dynamics Improve Speaker-Independent Lipreading
AU - Matteo Riva; Michael Wand; Jürgen Schmidhuber
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5108
ER -
Matteo Riva, Michael Wand, Jürgen Schmidhuber. (2020). Motion Dynamics Improve Speaker-Independent Lipreading. IEEE SigPort. http://sigport.org/5108
Matteo Riva, Michael Wand, Jürgen Schmidhuber, 2020. Motion Dynamics Improve Speaker-Independent Lipreading. Available at: http://sigport.org/5108.
Matteo Riva, Michael Wand, Jürgen Schmidhuber. (2020). "Motion Dynamics Improve Speaker-Independent Lipreading." Web.
1. Matteo Riva, Michael Wand, Jürgen Schmidhuber. Motion Dynamics Improve Speaker-Independent Lipreading [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5108

Knowledge Distillation Using Output Errors for Self-Attention ASR Models


Most automatic speech recognition (ASR) neural network models are not suitable for mobile devices due to their large model sizes. Therefore, it is required to reduce the model size to meet the limited hardware resources. In this study, we investigate sequence-level knowledge distillation techniques of self-attention ASR models for model compression.

Paper Details

Authors:
Hwidong Na, Hoshik Lee, Jihyun Lee, Tae Gyoon Kang, Min-Joong Lee, Young Sang Choi
Submitted On:
8 May 2019 - 10:02pm
Short Link:
Type:
Event:
Presenter's Name:
Document Year:
Cite

Document Files

icassp-2019-poster_v1.1.pptx

(160)

Subscribe

[1] Hwidong Na, Hoshik Lee, Jihyun Lee, Tae Gyoon Kang, Min-Joong Lee, Young Sang Choi, "Knowledge Distillation Using Output Errors for Self-Attention ASR Models", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4140. Accessed: Aug. 12, 2020.
@article{4140-19,
url = {http://sigport.org/4140},
author = {Hwidong Na; Hoshik Lee; Jihyun Lee; Tae Gyoon Kang; Min-Joong Lee; Young Sang Choi },
publisher = {IEEE SigPort},
title = {Knowledge Distillation Using Output Errors for Self-Attention ASR Models},
year = {2019} }
TY - EJOUR
T1 - Knowledge Distillation Using Output Errors for Self-Attention ASR Models
AU - Hwidong Na; Hoshik Lee; Jihyun Lee; Tae Gyoon Kang; Min-Joong Lee; Young Sang Choi
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4140
ER -
Hwidong Na, Hoshik Lee, Jihyun Lee, Tae Gyoon Kang, Min-Joong Lee, Young Sang Choi. (2019). Knowledge Distillation Using Output Errors for Self-Attention ASR Models. IEEE SigPort. http://sigport.org/4140
Hwidong Na, Hoshik Lee, Jihyun Lee, Tae Gyoon Kang, Min-Joong Lee, Young Sang Choi, 2019. Knowledge Distillation Using Output Errors for Self-Attention ASR Models. Available at: http://sigport.org/4140.
Hwidong Na, Hoshik Lee, Jihyun Lee, Tae Gyoon Kang, Min-Joong Lee, Young Sang Choi. (2019). "Knowledge Distillation Using Output Errors for Self-Attention ASR Models." Web.
1. Hwidong Na, Hoshik Lee, Jihyun Lee, Tae Gyoon Kang, Min-Joong Lee, Young Sang Choi. Knowledge Distillation Using Output Errors for Self-Attention ASR Models [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4140

RECOGNIZING ZERO-RESOURCED LANGUAGES BASED ON MISMATCHED MACHINE TRANSCRIPTIONS


Mismatched crowdsourcing based probabilistic human transcription has been proposed recently for training and adapting acoustic models for zero-resourced languages where we do not have any native transcriptions. This paper describes a machine transcription based phone recognition system for recognizing zero-resourced languages and compares it with baseline systems of MAP adaptation and semi-supervised self training.

Paper Details

Authors:
Mark Hasegawa-Johnson, Nancy F. Chen
Submitted On:
12 April 2018 - 7:52pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Poster.pdf

(955)

Subscribe

[1] Mark Hasegawa-Johnson, Nancy F. Chen, "RECOGNIZING ZERO-RESOURCED LANGUAGES BASED ON MISMATCHED MACHINE TRANSCRIPTIONS", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2520. Accessed: Aug. 12, 2020.
@article{2520-18,
url = {http://sigport.org/2520},
author = {Mark Hasegawa-Johnson; Nancy F. Chen },
publisher = {IEEE SigPort},
title = {RECOGNIZING ZERO-RESOURCED LANGUAGES BASED ON MISMATCHED MACHINE TRANSCRIPTIONS},
year = {2018} }
TY - EJOUR
T1 - RECOGNIZING ZERO-RESOURCED LANGUAGES BASED ON MISMATCHED MACHINE TRANSCRIPTIONS
AU - Mark Hasegawa-Johnson; Nancy F. Chen
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2520
ER -
Mark Hasegawa-Johnson, Nancy F. Chen. (2018). RECOGNIZING ZERO-RESOURCED LANGUAGES BASED ON MISMATCHED MACHINE TRANSCRIPTIONS. IEEE SigPort. http://sigport.org/2520
Mark Hasegawa-Johnson, Nancy F. Chen, 2018. RECOGNIZING ZERO-RESOURCED LANGUAGES BASED ON MISMATCHED MACHINE TRANSCRIPTIONS. Available at: http://sigport.org/2520.
Mark Hasegawa-Johnson, Nancy F. Chen. (2018). "RECOGNIZING ZERO-RESOURCED LANGUAGES BASED ON MISMATCHED MACHINE TRANSCRIPTIONS." Web.
1. Mark Hasegawa-Johnson, Nancy F. Chen. RECOGNIZING ZERO-RESOURCED LANGUAGES BASED ON MISMATCHED MACHINE TRANSCRIPTIONS [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2520

Knowledge Distillation for Small-footprint Highway Networks


Deep learning has significantly advanced state-of-the-art of speech
recognition in the past few years. However, compared to conventional
Gaussian mixture acoustic models, neural network models are
usually much larger, and are therefore not very deployable in embedded
devices. Previously, we investigated a compact highway deep
neural network (HDNN) for acoustic modelling, which is a type
of depth-gated feedforward neural network. We have shown that
HDNN-based acoustic models can achieve comparable recognition

Paper Details

Authors:
Liang Lu, Michelle Guo, Steve Renals
Submitted On:
3 March 2017 - 5:15pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Slides for ICASSP 2017

(265)

Subscribe

[1] Liang Lu, Michelle Guo, Steve Renals, "Knowledge Distillation for Small-footprint Highway Networks", IEEE SigPort, 2017. [Online]. Available: http://sigport.org/1619. Accessed: Aug. 12, 2020.
@article{1619-17,
url = {http://sigport.org/1619},
author = {Liang Lu; Michelle Guo; Steve Renals },
publisher = {IEEE SigPort},
title = {Knowledge Distillation for Small-footprint Highway Networks},
year = {2017} }
TY - EJOUR
T1 - Knowledge Distillation for Small-footprint Highway Networks
AU - Liang Lu; Michelle Guo; Steve Renals
PY - 2017
PB - IEEE SigPort
UR - http://sigport.org/1619
ER -
Liang Lu, Michelle Guo, Steve Renals. (2017). Knowledge Distillation for Small-footprint Highway Networks. IEEE SigPort. http://sigport.org/1619
Liang Lu, Michelle Guo, Steve Renals, 2017. Knowledge Distillation for Small-footprint Highway Networks. Available at: http://sigport.org/1619.
Liang Lu, Michelle Guo, Steve Renals. (2017). "Knowledge Distillation for Small-footprint Highway Networks." Web.
1. Liang Lu, Michelle Guo, Steve Renals. Knowledge Distillation for Small-footprint Highway Networks [Internet]. IEEE SigPort; 2017. Available from : http://sigport.org/1619