Sorry, you need to enable JavaScript to visit this website.

Large Vocabulary Continuous Recognition/Search (SPE-LVCR)

Applying Connectionist Temporal Classification Objective Function to Chinese Mandarin Speech Recognition


This paper establishs CTC-based systems on Chinese Mandarin ASR task, three different level output units are explored: characters, context independent phonemes and context dependent phoneme. To make training stable we propose Newbob-Trn strategy, furthermore, blank label prior cost is proposed to improve the performance. Further, we establish the CTC-trained UniLSTM-RC model, which ensures the real-time requirement of an online system, meanwhile, brings performance gain on Chinese Mandarin ASR task.

Paper Details

Authors:
Pengrui Wang,Jie Li,Bo Xu
Submitted On:
17 October 2016 - 11:07am
Short Link:
Type:
Event:
Presenter's Name:
Document Year:
Cite

Document Files

Applying Connectionist Temporal Classification Objective Function to Chinese Mandarin Speech Recognition.pptx

(112 downloads)

Keywords

Subscribe

[1] Pengrui Wang,Jie Li,Bo Xu, "Applying Connectionist Temporal Classification Objective Function to Chinese Mandarin Speech Recognition", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1231. Accessed: Oct. 23, 2017.
@article{1231-16,
url = {http://sigport.org/1231},
author = {Pengrui Wang;Jie Li;Bo Xu },
publisher = {IEEE SigPort},
title = {Applying Connectionist Temporal Classification Objective Function to Chinese Mandarin Speech Recognition},
year = {2016} }
TY - EJOUR
T1 - Applying Connectionist Temporal Classification Objective Function to Chinese Mandarin Speech Recognition
AU - Pengrui Wang;Jie Li;Bo Xu
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1231
ER -
Pengrui Wang,Jie Li,Bo Xu. (2016). Applying Connectionist Temporal Classification Objective Function to Chinese Mandarin Speech Recognition. IEEE SigPort. http://sigport.org/1231
Pengrui Wang,Jie Li,Bo Xu, 2016. Applying Connectionist Temporal Classification Objective Function to Chinese Mandarin Speech Recognition. Available at: http://sigport.org/1231.
Pengrui Wang,Jie Li,Bo Xu. (2016). "Applying Connectionist Temporal Classification Objective Function to Chinese Mandarin Speech Recognition." Web.
1. Pengrui Wang,Jie Li,Bo Xu. Applying Connectionist Temporal Classification Objective Function to Chinese Mandarin Speech Recognition [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1231

End-to-end Keywords Spotting Based on Connectionist Temporal Classification for Mandarin


Traditional hybrid DNN-HMM based ASR system for keywords spotting which models HMM states are not flexible to optimize for a specific language. In this paper, we construct an end-to-end acoustic model based ASR for keywords spotting in Mandarin. This model is constructed by LSTM-RNN and trained with objective measure of connectionist temporal classification. The input of the network is feature sequences, and the output the probabilities of the initials and finals of Mandarin syllables.

Paper Details

Authors:
Ye Bai, Jiangyan Yi, Hao Ni, Zhengqi Wen, Bin Liu, Ya Li, Jianhua Tao
Submitted On:
14 October 2016 - 4:44am
Short Link:
Type:
Event:
Presenter's Name:
Document Year:
Cite

Document Files

Ye Bai Poster.pdf

(155 downloads)

Keywords

Subscribe

[1] Ye Bai, Jiangyan Yi, Hao Ni, Zhengqi Wen, Bin Liu, Ya Li, Jianhua Tao, "End-to-end Keywords Spotting Based on Connectionist Temporal Classification for Mandarin", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1184. Accessed: Oct. 23, 2017.
@article{1184-16,
url = {http://sigport.org/1184},
author = {Ye Bai; Jiangyan Yi; Hao Ni; Zhengqi Wen; Bin Liu; Ya Li; Jianhua Tao },
publisher = {IEEE SigPort},
title = {End-to-end Keywords Spotting Based on Connectionist Temporal Classification for Mandarin},
year = {2016} }
TY - EJOUR
T1 - End-to-end Keywords Spotting Based on Connectionist Temporal Classification for Mandarin
AU - Ye Bai; Jiangyan Yi; Hao Ni; Zhengqi Wen; Bin Liu; Ya Li; Jianhua Tao
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1184
ER -
Ye Bai, Jiangyan Yi, Hao Ni, Zhengqi Wen, Bin Liu, Ya Li, Jianhua Tao. (2016). End-to-end Keywords Spotting Based on Connectionist Temporal Classification for Mandarin. IEEE SigPort. http://sigport.org/1184
Ye Bai, Jiangyan Yi, Hao Ni, Zhengqi Wen, Bin Liu, Ya Li, Jianhua Tao, 2016. End-to-end Keywords Spotting Based on Connectionist Temporal Classification for Mandarin. Available at: http://sigport.org/1184.
Ye Bai, Jiangyan Yi, Hao Ni, Zhengqi Wen, Bin Liu, Ya Li, Jianhua Tao. (2016). "End-to-end Keywords Spotting Based on Connectionist Temporal Classification for Mandarin." Web.
1. Ye Bai, Jiangyan Yi, Hao Ni, Zhengqi Wen, Bin Liu, Ya Li, Jianhua Tao. End-to-end Keywords Spotting Based on Connectionist Temporal Classification for Mandarin [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1184

Parallelizing WFST Speech Decoders (Poster)

Paper Details

Authors:
Charith Mendis, Jasha Droppo, Saeed Maleki, Madanlal Musuvathi, Todd Mytkowicz, Geoffrey Zweig
Submitted On:
6 April 2016 - 5:22pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

icassp_poster.pptx

(242 downloads)

Keywords

Subscribe

[1] Charith Mendis, Jasha Droppo, Saeed Maleki, Madanlal Musuvathi, Todd Mytkowicz, Geoffrey Zweig, "Parallelizing WFST Speech Decoders (Poster)", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1086. Accessed: Oct. 23, 2017.
@article{1086-16,
url = {http://sigport.org/1086},
author = {Charith Mendis; Jasha Droppo; Saeed Maleki; Madanlal Musuvathi; Todd Mytkowicz; Geoffrey Zweig },
publisher = {IEEE SigPort},
title = {Parallelizing WFST Speech Decoders (Poster)},
year = {2016} }
TY - EJOUR
T1 - Parallelizing WFST Speech Decoders (Poster)
AU - Charith Mendis; Jasha Droppo; Saeed Maleki; Madanlal Musuvathi; Todd Mytkowicz; Geoffrey Zweig
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1086
ER -
Charith Mendis, Jasha Droppo, Saeed Maleki, Madanlal Musuvathi, Todd Mytkowicz, Geoffrey Zweig. (2016). Parallelizing WFST Speech Decoders (Poster). IEEE SigPort. http://sigport.org/1086
Charith Mendis, Jasha Droppo, Saeed Maleki, Madanlal Musuvathi, Todd Mytkowicz, Geoffrey Zweig, 2016. Parallelizing WFST Speech Decoders (Poster). Available at: http://sigport.org/1086.
Charith Mendis, Jasha Droppo, Saeed Maleki, Madanlal Musuvathi, Todd Mytkowicz, Geoffrey Zweig. (2016). "Parallelizing WFST Speech Decoders (Poster)." Web.
1. Charith Mendis, Jasha Droppo, Saeed Maleki, Madanlal Musuvathi, Todd Mytkowicz, Geoffrey Zweig. Parallelizing WFST Speech Decoders (Poster) [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1086

System Combination with Log-linear Models


Improved speech recognition performance can often be obtained by combining multiple systems
together. Joint decoding, where scores from multiple systems are combined during decoding rather
than combining hypotheses, is one efficient approach for system combination. In standard joint
decoding the frame log-likelihoods from each system are used as the scores. These scores are then
weighted and summed to yield the final score for a frame. The system combination weights for this

poster.pdf

PDF icon poster.pdf (178 downloads)

Paper Details

Authors:
Jingzhou Yang, Chao Zhang, Anton Ragni, Mark Gales and Phil Woodland
Submitted On:
24 March 2016 - 8:04am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

poster.pdf

(178 downloads)

Keywords

Subscribe

[1] Jingzhou Yang, Chao Zhang, Anton Ragni, Mark Gales and Phil Woodland, "System Combination with Log-linear Models", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1022. Accessed: Oct. 23, 2017.
@article{1022-16,
url = {http://sigport.org/1022},
author = {Jingzhou Yang; Chao Zhang; Anton Ragni; Mark Gales and Phil Woodland },
publisher = {IEEE SigPort},
title = {System Combination with Log-linear Models},
year = {2016} }
TY - EJOUR
T1 - System Combination with Log-linear Models
AU - Jingzhou Yang; Chao Zhang; Anton Ragni; Mark Gales and Phil Woodland
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1022
ER -
Jingzhou Yang, Chao Zhang, Anton Ragni, Mark Gales and Phil Woodland. (2016). System Combination with Log-linear Models. IEEE SigPort. http://sigport.org/1022
Jingzhou Yang, Chao Zhang, Anton Ragni, Mark Gales and Phil Woodland, 2016. System Combination with Log-linear Models. Available at: http://sigport.org/1022.
Jingzhou Yang, Chao Zhang, Anton Ragni, Mark Gales and Phil Woodland. (2016). "System Combination with Log-linear Models." Web.
1. Jingzhou Yang, Chao Zhang, Anton Ragni, Mark Gales and Phil Woodland. System Combination with Log-linear Models [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1022

Investigating techniques for low resource conversational speech recognition

Paper Details

Authors:
Antoine Laurent, Thiago Fraga-Silva, Lori Lamel, Jean-Luc Gauvain
Submitted On:
22 March 2016 - 12:03pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

swahili2.pdf

(158 downloads)

Keywords

Subscribe

[1] Antoine Laurent, Thiago Fraga-Silva, Lori Lamel, Jean-Luc Gauvain, "Investigating techniques for low resource conversational speech recognition", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/968. Accessed: Oct. 23, 2017.
@article{968-16,
url = {http://sigport.org/968},
author = {Antoine Laurent; Thiago Fraga-Silva; Lori Lamel; Jean-Luc Gauvain },
publisher = {IEEE SigPort},
title = {Investigating techniques for low resource conversational speech recognition},
year = {2016} }
TY - EJOUR
T1 - Investigating techniques for low resource conversational speech recognition
AU - Antoine Laurent; Thiago Fraga-Silva; Lori Lamel; Jean-Luc Gauvain
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/968
ER -
Antoine Laurent, Thiago Fraga-Silva, Lori Lamel, Jean-Luc Gauvain. (2016). Investigating techniques for low resource conversational speech recognition. IEEE SigPort. http://sigport.org/968
Antoine Laurent, Thiago Fraga-Silva, Lori Lamel, Jean-Luc Gauvain, 2016. Investigating techniques for low resource conversational speech recognition. Available at: http://sigport.org/968.
Antoine Laurent, Thiago Fraga-Silva, Lori Lamel, Jean-Luc Gauvain. (2016). "Investigating techniques for low resource conversational speech recognition." Web.
1. Antoine Laurent, Thiago Fraga-Silva, Lori Lamel, Jean-Luc Gauvain. Investigating techniques for low resource conversational speech recognition [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/968

Xerox Conversational AI Agent (XCAI) for Enterprise Knowledgebase Q&A


In the past 5 years significant advances in Large Vocabulary Speech Recognition (LVSR), Deep Learning (DL) and Spoken Language Understanding (SLU), along with the explosive growth of wireless network bandwidth have given rise to three compelling Conversational AI agents that are available on the Andriod, iOS and Microsoft Smartphones. Conversational AI agents such as Google Now, Apple Siri and Microsoft Cortana are now the most preferred way of mobile web search and to perform command and control of the various smartphone apps.

Paper Details

Authors:
Vivek Tyagi, Arunasish Sen, Sriranjani R, Pragathi Praveena
Submitted On:
18 March 2016 - 1:50am
Short Link:
Type:
Event:
Document Year:
Cite

Document Files

Xerox XCAI ICASSP 2016.pdf

(181 downloads)

Keywords

Subscribe

[1] Vivek Tyagi, Arunasish Sen, Sriranjani R, Pragathi Praveena, "Xerox Conversational AI Agent (XCAI) for Enterprise Knowledgebase Q&A", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/739. Accessed: Oct. 23, 2017.
@article{739-16,
url = {http://sigport.org/739},
author = {Vivek Tyagi; Arunasish Sen; Sriranjani R; Pragathi Praveena },
publisher = {IEEE SigPort},
title = {Xerox Conversational AI Agent (XCAI) for Enterprise Knowledgebase Q&A},
year = {2016} }
TY - EJOUR
T1 - Xerox Conversational AI Agent (XCAI) for Enterprise Knowledgebase Q&A
AU - Vivek Tyagi; Arunasish Sen; Sriranjani R; Pragathi Praveena
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/739
ER -
Vivek Tyagi, Arunasish Sen, Sriranjani R, Pragathi Praveena. (2016). Xerox Conversational AI Agent (XCAI) for Enterprise Knowledgebase Q&A. IEEE SigPort. http://sigport.org/739
Vivek Tyagi, Arunasish Sen, Sriranjani R, Pragathi Praveena, 2016. Xerox Conversational AI Agent (XCAI) for Enterprise Knowledgebase Q&A. Available at: http://sigport.org/739.
Vivek Tyagi, Arunasish Sen, Sriranjani R, Pragathi Praveena. (2016). "Xerox Conversational AI Agent (XCAI) for Enterprise Knowledgebase Q&A." Web.
1. Vivek Tyagi, Arunasish Sen, Sriranjani R, Pragathi Praveena. Xerox Conversational AI Agent (XCAI) for Enterprise Knowledgebase Q&A [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/739