Sorry, you need to enable JavaScript to visit this website.

Acoustic Modeling for Automatic Speech Recognition (SPE-RECO)

Self-Training for End-to-End Speech Recognition - ICASSP 2020 Slides

Paper Details

Authors:
Ann Lee, Awni Hannun
Submitted On:
6 June 2020 - 10:19pm
Short Link:
Type:
Event:
Presenter's Name:
Document Year:
Cite

Document Files

Self-Training for End-to-End Speech Recognition - ICASSP 2020.pdf

(20)

Subscribe

[1] Ann Lee, Awni Hannun, "Self-Training for End-to-End Speech Recognition - ICASSP 2020 Slides", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5457. Accessed: Jul. 04, 2020.
@article{5457-20,
url = {http://sigport.org/5457},
author = { Ann Lee; Awni Hannun },
publisher = {IEEE SigPort},
title = {Self-Training for End-to-End Speech Recognition - ICASSP 2020 Slides},
year = {2020} }
TY - EJOUR
T1 - Self-Training for End-to-End Speech Recognition - ICASSP 2020 Slides
AU - Ann Lee; Awni Hannun
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5457
ER -
Ann Lee, Awni Hannun. (2020). Self-Training for End-to-End Speech Recognition - ICASSP 2020 Slides. IEEE SigPort. http://sigport.org/5457
Ann Lee, Awni Hannun, 2020. Self-Training for End-to-End Speech Recognition - ICASSP 2020 Slides. Available at: http://sigport.org/5457.
Ann Lee, Awni Hannun. (2020). "Self-Training for End-to-End Speech Recognition - ICASSP 2020 Slides." Web.
1. Ann Lee, Awni Hannun. Self-Training for End-to-End Speech Recognition - ICASSP 2020 Slides [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5457

CROSS LINGUAL TRANSFER LEARNING FOR ZERO-RESOURCE DOMAIN ADAPTATION


We propose a method for zero-resource domain adaptation of DNN acoustic models, for use in low-resource situations where the only in-language training data available may be poorly matched to the intended target domain. Our method uses a multi-lingual model in which several DNN layers are shared between languages. This architecture enables domain adaptation transforms learned for one well-resourced language to be applied to an entirely different low- resource language.

Paper Details

Authors:
Alberto Abad, Peter Bell, Andrea Carmantini, Steve Renals
Submitted On:
22 May 2020 - 8:32am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

ICASSP20_slides.pdf

(16)

Subscribe

[1] Alberto Abad, Peter Bell, Andrea Carmantini, Steve Renals, "CROSS LINGUAL TRANSFER LEARNING FOR ZERO-RESOURCE DOMAIN ADAPTATION", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5432. Accessed: Jul. 04, 2020.
@article{5432-20,
url = {http://sigport.org/5432},
author = {Alberto Abad; Peter Bell; Andrea Carmantini; Steve Renals },
publisher = {IEEE SigPort},
title = {CROSS LINGUAL TRANSFER LEARNING FOR ZERO-RESOURCE DOMAIN ADAPTATION},
year = {2020} }
TY - EJOUR
T1 - CROSS LINGUAL TRANSFER LEARNING FOR ZERO-RESOURCE DOMAIN ADAPTATION
AU - Alberto Abad; Peter Bell; Andrea Carmantini; Steve Renals
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5432
ER -
Alberto Abad, Peter Bell, Andrea Carmantini, Steve Renals. (2020). CROSS LINGUAL TRANSFER LEARNING FOR ZERO-RESOURCE DOMAIN ADAPTATION. IEEE SigPort. http://sigport.org/5432
Alberto Abad, Peter Bell, Andrea Carmantini, Steve Renals, 2020. CROSS LINGUAL TRANSFER LEARNING FOR ZERO-RESOURCE DOMAIN ADAPTATION. Available at: http://sigport.org/5432.
Alberto Abad, Peter Bell, Andrea Carmantini, Steve Renals. (2020). "CROSS LINGUAL TRANSFER LEARNING FOR ZERO-RESOURCE DOMAIN ADAPTATION." Web.
1. Alberto Abad, Peter Bell, Andrea Carmantini, Steve Renals. CROSS LINGUAL TRANSFER LEARNING FOR ZERO-RESOURCE DOMAIN ADAPTATION [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5432

DEEP NEURAL NETWORKS BASED AUTOMATIC SPEECH RECOGNITION FOR FOUR ETHIOPIAN LANGUAGES


In this work, we present speech recognition systems for four Ethiopian languages: Amharic, Tigrigna, Oromo and Wolaytta. We have used comparable training corpora of about 20 to 29 hours speech and evaluation speech of about 1 hour for each of the languages. For Amharic and Tigrigna, lexical and language models of different vocabulary size have been developed. For Oromo and Wolaytta, the training lexicons have been used for decoding.

Paper Details

Authors:
Solomon Teferra Abate,Martha Yifiru Tachbelie and Tanja Schultz
Submitted On:
20 May 2020 - 9:24am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:

Document Files

MarthaSolomonTanja.pdf

(15)

Subscribe

[1] Solomon Teferra Abate,Martha Yifiru Tachbelie and Tanja Schultz, "DEEP NEURAL NETWORKS BASED AUTOMATIC SPEECH RECOGNITION FOR FOUR ETHIOPIAN LANGUAGES", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5395. Accessed: Jul. 04, 2020.
@article{5395-20,
url = {http://sigport.org/5395},
author = {Solomon Teferra Abate;Martha Yifiru Tachbelie and Tanja Schultz },
publisher = {IEEE SigPort},
title = {DEEP NEURAL NETWORKS BASED AUTOMATIC SPEECH RECOGNITION FOR FOUR ETHIOPIAN LANGUAGES},
year = {2020} }
TY - EJOUR
T1 - DEEP NEURAL NETWORKS BASED AUTOMATIC SPEECH RECOGNITION FOR FOUR ETHIOPIAN LANGUAGES
AU - Solomon Teferra Abate;Martha Yifiru Tachbelie and Tanja Schultz
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5395
ER -
Solomon Teferra Abate,Martha Yifiru Tachbelie and Tanja Schultz. (2020). DEEP NEURAL NETWORKS BASED AUTOMATIC SPEECH RECOGNITION FOR FOUR ETHIOPIAN LANGUAGES. IEEE SigPort. http://sigport.org/5395
Solomon Teferra Abate,Martha Yifiru Tachbelie and Tanja Schultz, 2020. DEEP NEURAL NETWORKS BASED AUTOMATIC SPEECH RECOGNITION FOR FOUR ETHIOPIAN LANGUAGES. Available at: http://sigport.org/5395.
Solomon Teferra Abate,Martha Yifiru Tachbelie and Tanja Schultz. (2020). "DEEP NEURAL NETWORKS BASED AUTOMATIC SPEECH RECOGNITION FOR FOUR ETHIOPIAN LANGUAGES." Web.
1. Solomon Teferra Abate,Martha Yifiru Tachbelie and Tanja Schultz. DEEP NEURAL NETWORKS BASED AUTOMATIC SPEECH RECOGNITION FOR FOUR ETHIOPIAN LANGUAGES [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5395

Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders


We present Mockingjay as a new speech representation learning approach, where bidirectional Transformer encoders are pre-trained on a large amount of unlabeled speech. Previous speech representation methods learn through conditioning on past frames and predicting information about future frames. Whereas Mockingjay is designed to predict the current frame through jointly conditioning on both past and future contexts.

Paper Details

Authors:
Andy T. Liu, Shu-wen Yang, Po-Han Chi, Po-chun Hsu, Hung-yi Lee
Submitted On:
15 May 2020 - 10:18pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Presentation Slides

(14)

Subscribe

[1] Andy T. Liu, Shu-wen Yang, Po-Han Chi, Po-chun Hsu, Hung-yi Lee, "Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5364. Accessed: Jul. 04, 2020.
@article{5364-20,
url = {http://sigport.org/5364},
author = {Andy T. Liu; Shu-wen Yang; Po-Han Chi; Po-chun Hsu; Hung-yi Lee },
publisher = {IEEE SigPort},
title = {Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders},
year = {2020} }
TY - EJOUR
T1 - Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders
AU - Andy T. Liu; Shu-wen Yang; Po-Han Chi; Po-chun Hsu; Hung-yi Lee
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5364
ER -
Andy T. Liu, Shu-wen Yang, Po-Han Chi, Po-chun Hsu, Hung-yi Lee. (2020). Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders. IEEE SigPort. http://sigport.org/5364
Andy T. Liu, Shu-wen Yang, Po-Han Chi, Po-chun Hsu, Hung-yi Lee, 2020. Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders. Available at: http://sigport.org/5364.
Andy T. Liu, Shu-wen Yang, Po-Han Chi, Po-chun Hsu, Hung-yi Lee. (2020). "Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders." Web.
1. Andy T. Liu, Shu-wen Yang, Po-Han Chi, Po-chun Hsu, Hung-yi Lee. Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5364

TOWARDS FAST AND ACCURATE STREAMING END-TO-END ASR

Paper Details

Authors:
Bo Li, Shuo-Yiin Chang, Tara Sainath, Ruoming Pang, Yanzhang He, Trevor Strohman, Yonghui Wu
Submitted On:
14 May 2020 - 1:40am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

TU1_L1_3-rnnt_ep.pdf

(15)

Subscribe

[1] Bo Li, Shuo-Yiin Chang, Tara Sainath, Ruoming Pang, Yanzhang He, Trevor Strohman, Yonghui Wu, "TOWARDS FAST AND ACCURATE STREAMING END-TO-END ASR", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5223. Accessed: Jul. 04, 2020.
@article{5223-20,
url = {http://sigport.org/5223},
author = {Bo Li; Shuo-Yiin Chang; Tara Sainath; Ruoming Pang; Yanzhang He; Trevor Strohman; Yonghui Wu },
publisher = {IEEE SigPort},
title = {TOWARDS FAST AND ACCURATE STREAMING END-TO-END ASR},
year = {2020} }
TY - EJOUR
T1 - TOWARDS FAST AND ACCURATE STREAMING END-TO-END ASR
AU - Bo Li; Shuo-Yiin Chang; Tara Sainath; Ruoming Pang; Yanzhang He; Trevor Strohman; Yonghui Wu
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5223
ER -
Bo Li, Shuo-Yiin Chang, Tara Sainath, Ruoming Pang, Yanzhang He, Trevor Strohman, Yonghui Wu. (2020). TOWARDS FAST AND ACCURATE STREAMING END-TO-END ASR. IEEE SigPort. http://sigport.org/5223
Bo Li, Shuo-Yiin Chang, Tara Sainath, Ruoming Pang, Yanzhang He, Trevor Strohman, Yonghui Wu, 2020. TOWARDS FAST AND ACCURATE STREAMING END-TO-END ASR. Available at: http://sigport.org/5223.
Bo Li, Shuo-Yiin Chang, Tara Sainath, Ruoming Pang, Yanzhang He, Trevor Strohman, Yonghui Wu. (2020). "TOWARDS FAST AND ACCURATE STREAMING END-TO-END ASR." Web.
1. Bo Li, Shuo-Yiin Chang, Tara Sainath, Ruoming Pang, Yanzhang He, Trevor Strohman, Yonghui Wu. TOWARDS FAST AND ACCURATE STREAMING END-TO-END ASR [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5223

Robust End-To-End Keyword Spotting And Voice Command Recognition For Mobile Game


We present an effective method to solve a small-footprint keyword spotting (KWS) and voice command based user interface for mobile game. For KWS task, our goal is to design and implement a computationally very light deep neural network model into mobile device, in the same time to improve the accuracy in various noisy environments. We propose a simple yet effective convolutional neural network (CNN) with Google’s tensorflow-lite for android and Apple’s core ML for iOS deployment.

Paper Details

Authors:
Hu Xu*, Youshin Lim*, Shounan An, Hyemin Cho, Yoonseok Hong, Insoo Oh
Submitted On:
13 May 2020 - 11:20pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

[S&T]ROBUST END-TO-END KEYWORD SPOTTING AND VOICE COMMAND RECOGNITION FOR MOBILE GAME_icassp20'.pdf

(36)

Subscribe

[1] Hu Xu*, Youshin Lim*, Shounan An, Hyemin Cho, Yoonseok Hong, Insoo Oh, "Robust End-To-End Keyword Spotting And Voice Command Recognition For Mobile Game", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5213. Accessed: Jul. 04, 2020.
@article{5213-20,
url = {http://sigport.org/5213},
author = {Hu Xu*; Youshin Lim*; Shounan An; Hyemin Cho; Yoonseok Hong; Insoo Oh },
publisher = {IEEE SigPort},
title = {Robust End-To-End Keyword Spotting And Voice Command Recognition For Mobile Game},
year = {2020} }
TY - EJOUR
T1 - Robust End-To-End Keyword Spotting And Voice Command Recognition For Mobile Game
AU - Hu Xu*; Youshin Lim*; Shounan An; Hyemin Cho; Yoonseok Hong; Insoo Oh
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5213
ER -
Hu Xu*, Youshin Lim*, Shounan An, Hyemin Cho, Yoonseok Hong, Insoo Oh. (2020). Robust End-To-End Keyword Spotting And Voice Command Recognition For Mobile Game. IEEE SigPort. http://sigport.org/5213
Hu Xu*, Youshin Lim*, Shounan An, Hyemin Cho, Yoonseok Hong, Insoo Oh, 2020. Robust End-To-End Keyword Spotting And Voice Command Recognition For Mobile Game. Available at: http://sigport.org/5213.
Hu Xu*, Youshin Lim*, Shounan An, Hyemin Cho, Yoonseok Hong, Insoo Oh. (2020). "Robust End-To-End Keyword Spotting And Voice Command Recognition For Mobile Game." Web.
1. Hu Xu*, Youshin Lim*, Shounan An, Hyemin Cho, Yoonseok Hong, Insoo Oh. Robust End-To-End Keyword Spotting And Voice Command Recognition For Mobile Game [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5213

A Streaming On-Device End-to-End Model Surpassing Server-Side Conventional Model Quality and Latency

Paper Details

Authors:
Arun Narayanan, Ruoming Pang, Antoine Bruguier, Shuo-yiin Chang, Wei Li, Raziel Alvarez, Zhifeng Chen, Chung-Cheng Chiu, David Garcia, Alex Gruenstein, Ke Hu, Minho Jin, Anjuli Kannan, Qiao Liang, Ian McGraw, Cal Peyser, Rohit Prabhavalkar, Golan Pundak,
Submitted On:
13 May 2020 - 6:16pm
Short Link:
Type:
Event:

Document Files

ICASSP - E2E Paper.pdf

(18)

Subscribe

[1] Arun Narayanan, Ruoming Pang, Antoine Bruguier, Shuo-yiin Chang, Wei Li, Raziel Alvarez, Zhifeng Chen, Chung-Cheng Chiu, David Garcia, Alex Gruenstein, Ke Hu, Minho Jin, Anjuli Kannan, Qiao Liang, Ian McGraw, Cal Peyser, Rohit Prabhavalkar, Golan Pundak, , "A Streaming On-Device End-to-End Model Surpassing Server-Side Conventional Model Quality and Latency", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5163. Accessed: Jul. 04, 2020.
@article{5163-20,
url = {http://sigport.org/5163},
author = {Arun Narayanan; Ruoming Pang; Antoine Bruguier; Shuo-yiin Chang; Wei Li; Raziel Alvarez; Zhifeng Chen; Chung-Cheng Chiu; David Garcia; Alex Gruenstein; Ke Hu; Minho Jin; Anjuli Kannan; Qiao Liang; Ian McGraw; Cal Peyser; Rohit Prabhavalkar; Golan Pundak; },
publisher = {IEEE SigPort},
title = {A Streaming On-Device End-to-End Model Surpassing Server-Side Conventional Model Quality and Latency},
year = {2020} }
TY - EJOUR
T1 - A Streaming On-Device End-to-End Model Surpassing Server-Side Conventional Model Quality and Latency
AU - Arun Narayanan; Ruoming Pang; Antoine Bruguier; Shuo-yiin Chang; Wei Li; Raziel Alvarez; Zhifeng Chen; Chung-Cheng Chiu; David Garcia; Alex Gruenstein; Ke Hu; Minho Jin; Anjuli Kannan; Qiao Liang; Ian McGraw; Cal Peyser; Rohit Prabhavalkar; Golan Pundak;
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5163
ER -
Arun Narayanan, Ruoming Pang, Antoine Bruguier, Shuo-yiin Chang, Wei Li, Raziel Alvarez, Zhifeng Chen, Chung-Cheng Chiu, David Garcia, Alex Gruenstein, Ke Hu, Minho Jin, Anjuli Kannan, Qiao Liang, Ian McGraw, Cal Peyser, Rohit Prabhavalkar, Golan Pundak, . (2020). A Streaming On-Device End-to-End Model Surpassing Server-Side Conventional Model Quality and Latency. IEEE SigPort. http://sigport.org/5163
Arun Narayanan, Ruoming Pang, Antoine Bruguier, Shuo-yiin Chang, Wei Li, Raziel Alvarez, Zhifeng Chen, Chung-Cheng Chiu, David Garcia, Alex Gruenstein, Ke Hu, Minho Jin, Anjuli Kannan, Qiao Liang, Ian McGraw, Cal Peyser, Rohit Prabhavalkar, Golan Pundak, , 2020. A Streaming On-Device End-to-End Model Surpassing Server-Side Conventional Model Quality and Latency. Available at: http://sigport.org/5163.
Arun Narayanan, Ruoming Pang, Antoine Bruguier, Shuo-yiin Chang, Wei Li, Raziel Alvarez, Zhifeng Chen, Chung-Cheng Chiu, David Garcia, Alex Gruenstein, Ke Hu, Minho Jin, Anjuli Kannan, Qiao Liang, Ian McGraw, Cal Peyser, Rohit Prabhavalkar, Golan Pundak, . (2020). "A Streaming On-Device End-to-End Model Surpassing Server-Side Conventional Model Quality and Latency." Web.
1. Arun Narayanan, Ruoming Pang, Antoine Bruguier, Shuo-yiin Chang, Wei Li, Raziel Alvarez, Zhifeng Chen, Chung-Cheng Chiu, David Garcia, Alex Gruenstein, Ke Hu, Minho Jin, Anjuli Kannan, Qiao Liang, Ian McGraw, Cal Peyser, Rohit Prabhavalkar, Golan Pundak, . A Streaming On-Device End-to-End Model Surpassing Server-Side Conventional Model Quality and Latency [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5163

An Attention-Based Joint Acoustic and Text On-Device End-to-End Model'

Paper Details

Authors:
Ruoming Pang, Ron J. Weiss, Chung-cheng Chiu, Trevor Strohman
Submitted On:
13 May 2020 - 6:13pm
Short Link:
Type:
Event:

Document Files

ICASSP - JATD Paper.pdf

(28)

Subscribe

[1] Ruoming Pang, Ron J. Weiss, Chung-cheng Chiu, Trevor Strohman, "An Attention-Based Joint Acoustic and Text On-Device End-to-End Model'", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5160. Accessed: Jul. 04, 2020.
@article{5160-20,
url = {http://sigport.org/5160},
author = {Ruoming Pang; Ron J. Weiss; Chung-cheng Chiu; Trevor Strohman },
publisher = {IEEE SigPort},
title = {An Attention-Based Joint Acoustic and Text On-Device End-to-End Model'},
year = {2020} }
TY - EJOUR
T1 - An Attention-Based Joint Acoustic and Text On-Device End-to-End Model'
AU - Ruoming Pang; Ron J. Weiss; Chung-cheng Chiu; Trevor Strohman
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5160
ER -
Ruoming Pang, Ron J. Weiss, Chung-cheng Chiu, Trevor Strohman. (2020). An Attention-Based Joint Acoustic and Text On-Device End-to-End Model'. IEEE SigPort. http://sigport.org/5160
Ruoming Pang, Ron J. Weiss, Chung-cheng Chiu, Trevor Strohman, 2020. An Attention-Based Joint Acoustic and Text On-Device End-to-End Model'. Available at: http://sigport.org/5160.
Ruoming Pang, Ron J. Weiss, Chung-cheng Chiu, Trevor Strohman. (2020). "An Attention-Based Joint Acoustic and Text On-Device End-to-End Model'." Web.
1. Ruoming Pang, Ron J. Weiss, Chung-cheng Chiu, Trevor Strohman. An Attention-Based Joint Acoustic and Text On-Device End-to-End Model' [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5160

Speaker-aware Training of Attention-based End-to-End Speech Recognition using Neural Speaker Embeddings


In speaker-aware training, a speaker embedding is appended to DNN input features. This allows the DNN to effectively learn representations, which are robust to speaker variability.
We apply speaker-aware training to attention-based end- to-end speech recognition. We show that it can improve over a purely end-to-end baseline. We also propose speaker-aware training as a viable method to leverage untranscribed, speaker annotated data.

Paper Details

Authors:
Aku Rouhe, Tuomas Kaseva, Mikko Kurimo
Submitted On:
13 May 2020 - 4:49pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

icassp2020-slides.pdf

(18)

Subscribe

[1] Aku Rouhe, Tuomas Kaseva, Mikko Kurimo, "Speaker-aware Training of Attention-based End-to-End Speech Recognition using Neural Speaker Embeddings", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5139. Accessed: Jul. 04, 2020.
@article{5139-20,
url = {http://sigport.org/5139},
author = {Aku Rouhe; Tuomas Kaseva; Mikko Kurimo },
publisher = {IEEE SigPort},
title = {Speaker-aware Training of Attention-based End-to-End Speech Recognition using Neural Speaker Embeddings},
year = {2020} }
TY - EJOUR
T1 - Speaker-aware Training of Attention-based End-to-End Speech Recognition using Neural Speaker Embeddings
AU - Aku Rouhe; Tuomas Kaseva; Mikko Kurimo
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5139
ER -
Aku Rouhe, Tuomas Kaseva, Mikko Kurimo. (2020). Speaker-aware Training of Attention-based End-to-End Speech Recognition using Neural Speaker Embeddings. IEEE SigPort. http://sigport.org/5139
Aku Rouhe, Tuomas Kaseva, Mikko Kurimo, 2020. Speaker-aware Training of Attention-based End-to-End Speech Recognition using Neural Speaker Embeddings. Available at: http://sigport.org/5139.
Aku Rouhe, Tuomas Kaseva, Mikko Kurimo. (2020). "Speaker-aware Training of Attention-based End-to-End Speech Recognition using Neural Speaker Embeddings." Web.
1. Aku Rouhe, Tuomas Kaseva, Mikko Kurimo. Speaker-aware Training of Attention-based End-to-End Speech Recognition using Neural Speaker Embeddings [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5139

Small energy masking for improved neural network training for end-to-end speech recognition


In this paper, we present a Small Energy Masking (SEM) algorithm, which masks inputs having values below a certain threshold. More specifically, a time-frequency bin is masked if the filterbank energy in this bin is less than a certain energy threshold. A uniform distribution is employed to randomly generate the ratio of this energy threshold to the peak filterbank energy of each utterance in decibels. The unmasked feature elements are scaled so that the total sum of the feature values remain the same through this masking procedure.

Paper Details

Authors:
Chanwoo Kim, Kwangyoun Kim, Sathish Reddy Indurthi
Submitted On:
5 May 2020 - 5:27pm
Short Link:
Type:
Event:
Presenter's Name:
Document Year:
Cite

Document Files

20200508_icassp_small_energy_masking_paper_3965_presentation.pdf

(27)

Subscribe

[1] Chanwoo Kim, Kwangyoun Kim, Sathish Reddy Indurthi, "Small energy masking for improved neural network training for end-to-end speech recognition", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5125. Accessed: Jul. 04, 2020.
@article{5125-20,
url = {http://sigport.org/5125},
author = {Chanwoo Kim; Kwangyoun Kim; Sathish Reddy Indurthi },
publisher = {IEEE SigPort},
title = {Small energy masking for improved neural network training for end-to-end speech recognition},
year = {2020} }
TY - EJOUR
T1 - Small energy masking for improved neural network training for end-to-end speech recognition
AU - Chanwoo Kim; Kwangyoun Kim; Sathish Reddy Indurthi
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5125
ER -
Chanwoo Kim, Kwangyoun Kim, Sathish Reddy Indurthi. (2020). Small energy masking for improved neural network training for end-to-end speech recognition. IEEE SigPort. http://sigport.org/5125
Chanwoo Kim, Kwangyoun Kim, Sathish Reddy Indurthi, 2020. Small energy masking for improved neural network training for end-to-end speech recognition. Available at: http://sigport.org/5125.
Chanwoo Kim, Kwangyoun Kim, Sathish Reddy Indurthi. (2020). "Small energy masking for improved neural network training for end-to-end speech recognition." Web.
1. Chanwoo Kim, Kwangyoun Kim, Sathish Reddy Indurthi. Small energy masking for improved neural network training for end-to-end speech recognition [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5125

Pages