Sorry, you need to enable JavaScript to visit this website.

General Topics in Speech Recognition (SPE-GASR)

Sequence-to-Sequence ASR Optimization via Reinforcement Learning


Despite the success of sequence-to-sequence approaches in automatic speech recognition (ASR) systems, the models still suffer from several problems, mainly due to the mismatch between the training and inference conditions. In the sequence-to-sequence architecture, the model is trained to predict the grapheme of the current time-step given the input of speech signal and the ground-truth grapheme history of the previous time-steps. However, it remains unclear how well the model approximates real-world speech during inference.

Paper Details

Authors:
Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
Submitted On:
14 April 2018 - 10:37am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Poster in PDF format

(64 downloads)

Subscribe

[1] Andros Tjandra, Sakriani Sakti, Satoshi Nakamura, "Sequence-to-Sequence ASR Optimization via Reinforcement Learning", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2834. Accessed: Sep. 24, 2018.
@article{2834-18,
url = {http://sigport.org/2834},
author = {Andros Tjandra; Sakriani Sakti; Satoshi Nakamura },
publisher = {IEEE SigPort},
title = {Sequence-to-Sequence ASR Optimization via Reinforcement Learning},
year = {2018} }
TY - EJOUR
T1 - Sequence-to-Sequence ASR Optimization via Reinforcement Learning
AU - Andros Tjandra; Sakriani Sakti; Satoshi Nakamura
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2834
ER -
Andros Tjandra, Sakriani Sakti, Satoshi Nakamura. (2018). Sequence-to-Sequence ASR Optimization via Reinforcement Learning. IEEE SigPort. http://sigport.org/2834
Andros Tjandra, Sakriani Sakti, Satoshi Nakamura, 2018. Sequence-to-Sequence ASR Optimization via Reinforcement Learning. Available at: http://sigport.org/2834.
Andros Tjandra, Sakriani Sakti, Satoshi Nakamura. (2018). "Sequence-to-Sequence ASR Optimization via Reinforcement Learning." Web.
1. Andros Tjandra, Sakriani Sakti, Satoshi Nakamura. Sequence-to-Sequence ASR Optimization via Reinforcement Learning [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2834

End-to-End Multimodal Speech Recognition


Transcription or sub-titling of open-domain videos is still a chal- lenging domain for Automatic Speech Recognition (ASR) due to the data’s challenging acoustics, variable signal processing and the essentially unrestricted domain of the data. In previous work, we have shown that the visual channel – specifically object and scene features – can help to adapt the acoustic model (AM) and language model (LM) of a recognizer, and we are now expanding this work to end-to-end approaches.

Paper Details

Authors:
Ramon Sanabria, Florian Metze
Submitted On:
12 April 2018 - 8:02pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

icassp-poster-end.pdf

(65 downloads)

Keywords

Additional Categories

Subscribe

[1] Ramon Sanabria, Florian Metze, "End-to-End Multimodal Speech Recognition", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2524. Accessed: Sep. 24, 2018.
@article{2524-18,
url = {http://sigport.org/2524},
author = {Ramon Sanabria; Florian Metze },
publisher = {IEEE SigPort},
title = {End-to-End Multimodal Speech Recognition},
year = {2018} }
TY - EJOUR
T1 - End-to-End Multimodal Speech Recognition
AU - Ramon Sanabria; Florian Metze
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2524
ER -
Ramon Sanabria, Florian Metze. (2018). End-to-End Multimodal Speech Recognition. IEEE SigPort. http://sigport.org/2524
Ramon Sanabria, Florian Metze, 2018. End-to-End Multimodal Speech Recognition. Available at: http://sigport.org/2524.
Ramon Sanabria, Florian Metze. (2018). "End-to-End Multimodal Speech Recognition." Web.
1. Ramon Sanabria, Florian Metze. End-to-End Multimodal Speech Recognition [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2524

AN INVESTIGATION INTO INSTANTANEOUS FREQUENCY ESTIMATION METHODS FOR IMPROVED SPEECH RECOGNITION FEATURES


There have been several studies, in the recent past, pointing to the
importance of analytic phase of the speech signal in human percep-
tion, especially in noisy conditions. However, phase information is
still not used in state-of-the-art speech recognition systems. In this
paper, we illustrate the importance of analytic phase of the speech
signal for automatic speech recognition. As the computation of ana-
lytic phase suffers from inevitable phase wrapping problem, we ex-
tract features from its time derivative, referred to as instantaneous

Paper Details

Authors:
Saurabhchand Bhati
Submitted On:
11 November 2017 - 8:10am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Paper #1418

(106 downloads)

Subscribe

[1] Saurabhchand Bhati, "AN INVESTIGATION INTO INSTANTANEOUS FREQUENCY ESTIMATION METHODS FOR IMPROVED SPEECH RECOGNITION FEATURES", IEEE SigPort, 2017. [Online]. Available: http://sigport.org/2305. Accessed: Sep. 24, 2018.
@article{2305-17,
url = {http://sigport.org/2305},
author = {Saurabhchand Bhati },
publisher = {IEEE SigPort},
title = {AN INVESTIGATION INTO INSTANTANEOUS FREQUENCY ESTIMATION METHODS FOR IMPROVED SPEECH RECOGNITION FEATURES},
year = {2017} }
TY - EJOUR
T1 - AN INVESTIGATION INTO INSTANTANEOUS FREQUENCY ESTIMATION METHODS FOR IMPROVED SPEECH RECOGNITION FEATURES
AU - Saurabhchand Bhati
PY - 2017
PB - IEEE SigPort
UR - http://sigport.org/2305
ER -
Saurabhchand Bhati. (2017). AN INVESTIGATION INTO INSTANTANEOUS FREQUENCY ESTIMATION METHODS FOR IMPROVED SPEECH RECOGNITION FEATURES. IEEE SigPort. http://sigport.org/2305
Saurabhchand Bhati, 2017. AN INVESTIGATION INTO INSTANTANEOUS FREQUENCY ESTIMATION METHODS FOR IMPROVED SPEECH RECOGNITION FEATURES. Available at: http://sigport.org/2305.
Saurabhchand Bhati. (2017). "AN INVESTIGATION INTO INSTANTANEOUS FREQUENCY ESTIMATION METHODS FOR IMPROVED SPEECH RECOGNITION FEATURES." Web.
1. Saurabhchand Bhati. AN INVESTIGATION INTO INSTANTANEOUS FREQUENCY ESTIMATION METHODS FOR IMPROVED SPEECH RECOGNITION FEATURES [Internet]. IEEE SigPort; 2017. Available from : http://sigport.org/2305

Comparison of DCT and Autoencoder-based Features for DNN-HMM Multimodal Silent Speech Recognition

Paper Details

Authors:
Yan Ji, Hongcui Wang, Bruce Denby
Submitted On:
15 October 2016 - 8:47am
Short Link:
Type:

Document Files

poster-llc.pdf

(287 downloads)

Subscribe

[1] Yan Ji, Hongcui Wang, Bruce Denby, "Comparison of DCT and Autoencoder-based Features for DNN-HMM Multimodal Silent Speech Recognition", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1235. Accessed: Sep. 24, 2018.
@article{1235-16,
url = {http://sigport.org/1235},
author = {Yan Ji; Hongcui Wang; Bruce Denby },
publisher = {IEEE SigPort},
title = {Comparison of DCT and Autoencoder-based Features for DNN-HMM Multimodal Silent Speech Recognition},
year = {2016} }
TY - EJOUR
T1 - Comparison of DCT and Autoencoder-based Features for DNN-HMM Multimodal Silent Speech Recognition
AU - Yan Ji; Hongcui Wang; Bruce Denby
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1235
ER -
Yan Ji, Hongcui Wang, Bruce Denby. (2016). Comparison of DCT and Autoencoder-based Features for DNN-HMM Multimodal Silent Speech Recognition. IEEE SigPort. http://sigport.org/1235
Yan Ji, Hongcui Wang, Bruce Denby, 2016. Comparison of DCT and Autoencoder-based Features for DNN-HMM Multimodal Silent Speech Recognition. Available at: http://sigport.org/1235.
Yan Ji, Hongcui Wang, Bruce Denby. (2016). "Comparison of DCT and Autoencoder-based Features for DNN-HMM Multimodal Silent Speech Recognition." Web.
1. Yan Ji, Hongcui Wang, Bruce Denby. Comparison of DCT and Autoencoder-based Features for DNN-HMM Multimodal Silent Speech Recognition [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1235

Comparison of DCT and Autoencoder-based Features for DNN-HMM Multimodal Silent Speech Recognition

Paper Details

Authors:
Yan Ji, Hongcui Wang, Bruce Denby
Submitted On:
15 October 2016 - 8:47am
Short Link:
Type:

Document Files

poster-llc.pdf

(281 downloads)

Subscribe

[1] Yan Ji, Hongcui Wang, Bruce Denby, "Comparison of DCT and Autoencoder-based Features for DNN-HMM Multimodal Silent Speech Recognition", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1234. Accessed: Sep. 24, 2018.
@article{1234-16,
url = {http://sigport.org/1234},
author = {Yan Ji; Hongcui Wang; Bruce Denby },
publisher = {IEEE SigPort},
title = {Comparison of DCT and Autoencoder-based Features for DNN-HMM Multimodal Silent Speech Recognition},
year = {2016} }
TY - EJOUR
T1 - Comparison of DCT and Autoencoder-based Features for DNN-HMM Multimodal Silent Speech Recognition
AU - Yan Ji; Hongcui Wang; Bruce Denby
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1234
ER -
Yan Ji, Hongcui Wang, Bruce Denby. (2016). Comparison of DCT and Autoencoder-based Features for DNN-HMM Multimodal Silent Speech Recognition. IEEE SigPort. http://sigport.org/1234
Yan Ji, Hongcui Wang, Bruce Denby, 2016. Comparison of DCT and Autoencoder-based Features for DNN-HMM Multimodal Silent Speech Recognition. Available at: http://sigport.org/1234.
Yan Ji, Hongcui Wang, Bruce Denby. (2016). "Comparison of DCT and Autoencoder-based Features for DNN-HMM Multimodal Silent Speech Recognition." Web.
1. Yan Ji, Hongcui Wang, Bruce Denby. Comparison of DCT and Autoencoder-based Features for DNN-HMM Multimodal Silent Speech Recognition [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1234

FILTERBANK LEARNING USING CONVOLUTIONAL RESTRICTED BOLTZMANN MACHINE FOR SPEECH RECOGNITION


Examples of subband filters learned using ConvRBM: (a) filters in time-domain (i.e., impulse responses), (b) filters in frequency-domain (i.e., frequency responses).

Convolutional Restricted Boltzmann Machine (ConvRBM) as a model for speech signal is presented in this paper. We have
developed ConvRBM with sampling from noisy rectified linear units (NReLUs). ConvRBM is trained in an unsupervised way to model speech signal of arbitrary lengths. Weights of the model can represent an auditory-like filterbank. Our

poster.pdf

PDF icon poster.pdf (718 downloads)

Paper Details

Authors:
Hardik B. Sailor, Hemant A. Patil
Submitted On:
31 March 2016 - 4:04am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

poster.pdf

(718 downloads)

Subscribe

[1] Hardik B. Sailor, Hemant A. Patil, "FILTERBANK LEARNING USING CONVOLUTIONAL RESTRICTED BOLTZMANN MACHINE FOR SPEECH RECOGNITION", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1075. Accessed: Sep. 24, 2018.
@article{1075-16,
url = {http://sigport.org/1075},
author = {Hardik B. Sailor; Hemant A. Patil },
publisher = {IEEE SigPort},
title = {FILTERBANK LEARNING USING CONVOLUTIONAL RESTRICTED BOLTZMANN MACHINE FOR SPEECH RECOGNITION},
year = {2016} }
TY - EJOUR
T1 - FILTERBANK LEARNING USING CONVOLUTIONAL RESTRICTED BOLTZMANN MACHINE FOR SPEECH RECOGNITION
AU - Hardik B. Sailor; Hemant A. Patil
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1075
ER -
Hardik B. Sailor, Hemant A. Patil. (2016). FILTERBANK LEARNING USING CONVOLUTIONAL RESTRICTED BOLTZMANN MACHINE FOR SPEECH RECOGNITION. IEEE SigPort. http://sigport.org/1075
Hardik B. Sailor, Hemant A. Patil, 2016. FILTERBANK LEARNING USING CONVOLUTIONAL RESTRICTED BOLTZMANN MACHINE FOR SPEECH RECOGNITION. Available at: http://sigport.org/1075.
Hardik B. Sailor, Hemant A. Patil. (2016). "FILTERBANK LEARNING USING CONVOLUTIONAL RESTRICTED BOLTZMANN MACHINE FOR SPEECH RECOGNITION." Web.
1. Hardik B. Sailor, Hemant A. Patil. FILTERBANK LEARNING USING CONVOLUTIONAL RESTRICTED BOLTZMANN MACHINE FOR SPEECH RECOGNITION [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1075

FILTERBANK LEARNING USING CONVOLUTIONAL RESTRICTED BOLTZMANN MACHINE FOR SPEECH RECOGNITION


Examples of subband filters learned using ConvRBM: (a) filters in time-domain (i.e., impulse responses), (b) filters in frequency-domain (i.e., frequency responses).

Convolutional Restricted Boltzmann Machine (ConvRBM) as a model for speech signal is presented in this paper. We have
developed ConvRBM with sampling from noisy rectified linear units (NReLUs). ConvRBM is trained in an unsupervised way to model speech signal of arbitrary lengths. Weights of the model can represent an auditory-like filterbank. Our

poster.pdf

PDF icon poster.pdf (718 downloads)

Paper Details

Authors:
Hardik B. Sailor, Hemant A. Patil
Submitted On:
31 March 2016 - 4:04am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

poster.pdf

(718 downloads)

Subscribe

[1] Hardik B. Sailor, Hemant A. Patil, "FILTERBANK LEARNING USING CONVOLUTIONAL RESTRICTED BOLTZMANN MACHINE FOR SPEECH RECOGNITION", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1074. Accessed: Sep. 24, 2018.
@article{1074-16,
url = {http://sigport.org/1074},
author = {Hardik B. Sailor; Hemant A. Patil },
publisher = {IEEE SigPort},
title = {FILTERBANK LEARNING USING CONVOLUTIONAL RESTRICTED BOLTZMANN MACHINE FOR SPEECH RECOGNITION},
year = {2016} }
TY - EJOUR
T1 - FILTERBANK LEARNING USING CONVOLUTIONAL RESTRICTED BOLTZMANN MACHINE FOR SPEECH RECOGNITION
AU - Hardik B. Sailor; Hemant A. Patil
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1074
ER -
Hardik B. Sailor, Hemant A. Patil. (2016). FILTERBANK LEARNING USING CONVOLUTIONAL RESTRICTED BOLTZMANN MACHINE FOR SPEECH RECOGNITION. IEEE SigPort. http://sigport.org/1074
Hardik B. Sailor, Hemant A. Patil, 2016. FILTERBANK LEARNING USING CONVOLUTIONAL RESTRICTED BOLTZMANN MACHINE FOR SPEECH RECOGNITION. Available at: http://sigport.org/1074.
Hardik B. Sailor, Hemant A. Patil. (2016). "FILTERBANK LEARNING USING CONVOLUTIONAL RESTRICTED BOLTZMANN MACHINE FOR SPEECH RECOGNITION." Web.
1. Hardik B. Sailor, Hemant A. Patil. FILTERBANK LEARNING USING CONVOLUTIONAL RESTRICTED BOLTZMANN MACHINE FOR SPEECH RECOGNITION [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1074

Selection and Combination of Hypotheses for Dialectal Speech Recognition

Paper Details

Authors:
Victor Soto, Olivier Siohan, Mohamed Elfeky, Pedro Moreno
Submitted On:
21 March 2016 - 9:17pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

poster_icassp16.pdf

(307 downloads)

Subscribe

[1] Victor Soto, Olivier Siohan, Mohamed Elfeky, Pedro Moreno, "Selection and Combination of Hypotheses for Dialectal Speech Recognition", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/945. Accessed: Sep. 24, 2018.
@article{945-16,
url = {http://sigport.org/945},
author = {Victor Soto; Olivier Siohan; Mohamed Elfeky; Pedro Moreno },
publisher = {IEEE SigPort},
title = {Selection and Combination of Hypotheses for Dialectal Speech Recognition},
year = {2016} }
TY - EJOUR
T1 - Selection and Combination of Hypotheses for Dialectal Speech Recognition
AU - Victor Soto; Olivier Siohan; Mohamed Elfeky; Pedro Moreno
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/945
ER -
Victor Soto, Olivier Siohan, Mohamed Elfeky, Pedro Moreno. (2016). Selection and Combination of Hypotheses for Dialectal Speech Recognition. IEEE SigPort. http://sigport.org/945
Victor Soto, Olivier Siohan, Mohamed Elfeky, Pedro Moreno, 2016. Selection and Combination of Hypotheses for Dialectal Speech Recognition. Available at: http://sigport.org/945.
Victor Soto, Olivier Siohan, Mohamed Elfeky, Pedro Moreno. (2016). "Selection and Combination of Hypotheses for Dialectal Speech Recognition." Web.
1. Victor Soto, Olivier Siohan, Mohamed Elfeky, Pedro Moreno. Selection and Combination of Hypotheses for Dialectal Speech Recognition [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/945

Divergence estimation based on deep neural networks and its use for language identification


In this paper, we propose a method to estimate statistical divergence between probability distributions by a DNN-based discriminative approach and its use for language identification tasks. Since statistical divergence is generally defined as a functional of two probability density functions, these density functions are usually represented in a parametric form. Then, if a mismatch exists between the assumed distribution and its true one, the obtained divergence becomes erroneous.

Paper Details

Authors:
Yosuke Kashiwagi, Congying Zhang, Daisuke Saito, Nobuaki Minematsu
Submitted On:
21 March 2016 - 8:31pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

ICASSP_2016.pdf

(714 downloads)

Subscribe

[1] Yosuke Kashiwagi, Congying Zhang, Daisuke Saito, Nobuaki Minematsu, "Divergence estimation based on deep neural networks and its use for language identification", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/942. Accessed: Sep. 24, 2018.
@article{942-16,
url = {http://sigport.org/942},
author = {Yosuke Kashiwagi; Congying Zhang; Daisuke Saito; Nobuaki Minematsu },
publisher = {IEEE SigPort},
title = {Divergence estimation based on deep neural networks and its use for language identification},
year = {2016} }
TY - EJOUR
T1 - Divergence estimation based on deep neural networks and its use for language identification
AU - Yosuke Kashiwagi; Congying Zhang; Daisuke Saito; Nobuaki Minematsu
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/942
ER -
Yosuke Kashiwagi, Congying Zhang, Daisuke Saito, Nobuaki Minematsu. (2016). Divergence estimation based on deep neural networks and its use for language identification. IEEE SigPort. http://sigport.org/942
Yosuke Kashiwagi, Congying Zhang, Daisuke Saito, Nobuaki Minematsu, 2016. Divergence estimation based on deep neural networks and its use for language identification. Available at: http://sigport.org/942.
Yosuke Kashiwagi, Congying Zhang, Daisuke Saito, Nobuaki Minematsu. (2016). "Divergence estimation based on deep neural networks and its use for language identification." Web.
1. Yosuke Kashiwagi, Congying Zhang, Daisuke Saito, Nobuaki Minematsu. Divergence estimation based on deep neural networks and its use for language identification [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/942

ACCELERATING MULTI-USER LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION ON HETEROGENEOUS CPU-GPU PLATFORMS


In our previous work, we developed a GPU-accelerated speech recognition engine optimized for faster than real time speech recognition on a heterogeneous CPU-GPU architecture. In this work, we focused on developing a scalable server-client architecture specifically optimized to simultaneously decode multiple users in real-time.

Paper Details

Authors:
Ian Lane
Submitted On:
20 March 2016 - 6:56pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

2016_Kim_ICASSP-poster.pdf

(332 downloads)

Subscribe

[1] Ian Lane, "ACCELERATING MULTI-USER LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION ON HETEROGENEOUS CPU-GPU PLATFORMS", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/889. Accessed: Sep. 24, 2018.
@article{889-16,
url = {http://sigport.org/889},
author = {Ian Lane },
publisher = {IEEE SigPort},
title = {ACCELERATING MULTI-USER LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION ON HETEROGENEOUS CPU-GPU PLATFORMS},
year = {2016} }
TY - EJOUR
T1 - ACCELERATING MULTI-USER LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION ON HETEROGENEOUS CPU-GPU PLATFORMS
AU - Ian Lane
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/889
ER -
Ian Lane. (2016). ACCELERATING MULTI-USER LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION ON HETEROGENEOUS CPU-GPU PLATFORMS. IEEE SigPort. http://sigport.org/889
Ian Lane, 2016. ACCELERATING MULTI-USER LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION ON HETEROGENEOUS CPU-GPU PLATFORMS. Available at: http://sigport.org/889.
Ian Lane. (2016). "ACCELERATING MULTI-USER LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION ON HETEROGENEOUS CPU-GPU PLATFORMS." Web.
1. Ian Lane. ACCELERATING MULTI-USER LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION ON HETEROGENEOUS CPU-GPU PLATFORMS [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/889

Pages