Sorry, you need to enable JavaScript to visit this website.

Spoken Language Processing

Robust Spoken Language Understanding with unsupervised ASR-error adaptation


Robustness to errors produced by automatic speech recognition (ASR) is essential for Spoken Language Understanding (SLU). Traditional robust SLU typically needs ASR hypotheses with semantic annotations for training. However, semantic annotation is very expensive, and the corresponding ASR system may change frequently. Here, we propose a novel unsupervised ASR-error adaptation method, obviating the need of annotated ASR hypotheses.

Paper Details

Authors:
Su Zhu, Ouyu Lan, Kai Yu
Submitted On:
19 April 2018 - 3:58pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

zhu-icassp18-poster.pdf

(60 downloads)

Subscribe

[1] Su Zhu, Ouyu Lan, Kai Yu, "Robust Spoken Language Understanding with unsupervised ASR-error adaptation", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/3016. Accessed: Jul. 23, 2018.
@article{3016-18,
url = {http://sigport.org/3016},
author = {Su Zhu; Ouyu Lan; Kai Yu },
publisher = {IEEE SigPort},
title = {Robust Spoken Language Understanding with unsupervised ASR-error adaptation},
year = {2018} }
TY - EJOUR
T1 - Robust Spoken Language Understanding with unsupervised ASR-error adaptation
AU - Su Zhu; Ouyu Lan; Kai Yu
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/3016
ER -
Su Zhu, Ouyu Lan, Kai Yu. (2018). Robust Spoken Language Understanding with unsupervised ASR-error adaptation. IEEE SigPort. http://sigport.org/3016
Su Zhu, Ouyu Lan, Kai Yu, 2018. Robust Spoken Language Understanding with unsupervised ASR-error adaptation. Available at: http://sigport.org/3016.
Su Zhu, Ouyu Lan, Kai Yu. (2018). "Robust Spoken Language Understanding with unsupervised ASR-error adaptation." Web.
1. Su Zhu, Ouyu Lan, Kai Yu. Robust Spoken Language Understanding with unsupervised ASR-error adaptation [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/3016

DEEP MULTIMODAL LEARNING FOR EMOTION RECOGNITION IN SPOKEN LANGUAGE


In this paper, we present a novel deep multimodal framework to predict human emotions based on sentence-level spoken language. Our architecture has two distinctive characteristics. First, it extracts the high-level features from both text and audio via a hybrid deep multimodal structure, which considers the spatial information from text, temporal information from audio, and high-level associations from low-level handcrafted features.

Paper Details

Authors:
Yue Gu, Shuhong Chen, Ivan Marsic
Submitted On:
13 April 2018 - 3:30pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

ICASSP_2018_POSTER.pdf

(91 downloads)

Subscribe

[1] Yue Gu, Shuhong Chen, Ivan Marsic, "DEEP MULTIMODAL LEARNING FOR EMOTION RECOGNITION IN SPOKEN LANGUAGE", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2752. Accessed: Jul. 23, 2018.
@article{2752-18,
url = {http://sigport.org/2752},
author = {Yue Gu; Shuhong Chen; Ivan Marsic },
publisher = {IEEE SigPort},
title = {DEEP MULTIMODAL LEARNING FOR EMOTION RECOGNITION IN SPOKEN LANGUAGE},
year = {2018} }
TY - EJOUR
T1 - DEEP MULTIMODAL LEARNING FOR EMOTION RECOGNITION IN SPOKEN LANGUAGE
AU - Yue Gu; Shuhong Chen; Ivan Marsic
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2752
ER -
Yue Gu, Shuhong Chen, Ivan Marsic. (2018). DEEP MULTIMODAL LEARNING FOR EMOTION RECOGNITION IN SPOKEN LANGUAGE. IEEE SigPort. http://sigport.org/2752
Yue Gu, Shuhong Chen, Ivan Marsic, 2018. DEEP MULTIMODAL LEARNING FOR EMOTION RECOGNITION IN SPOKEN LANGUAGE. Available at: http://sigport.org/2752.
Yue Gu, Shuhong Chen, Ivan Marsic. (2018). "DEEP MULTIMODAL LEARNING FOR EMOTION RECOGNITION IN SPOKEN LANGUAGE." Web.
1. Yue Gu, Shuhong Chen, Ivan Marsic. DEEP MULTIMODAL LEARNING FOR EMOTION RECOGNITION IN SPOKEN LANGUAGE [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2752

FACTORIZED HIDDEN VARIABILITY LEARNING FOR ADAPTATION OF SHORT DURATION LANGUAGE IDENTIFICATION MODELS


Bidirectional long short term memory (BLSTM) recurrent neural networks (RNNs) have recently outperformed other state-of-the-art approaches, such as i-vector and deep neural networks (DNNs) in automatic language identification (LID), particularly when testing with very short utterances (∼3s). Mismatches conditions between training and test data, e.g. speaker, channel, duration and environmental noise, are a major source of performance degradation for LID.

POSTER.pdf

PDF icon POSTER.pdf (887 downloads)

Paper Details

Authors:
Sarith Fernando, Vidhyasaharan Sethu, Eliathamby Ambikairajah
Submitted On:
12 April 2018 - 9:48pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

POSTER.pdf

(887 downloads)

Subscribe

[1] Sarith Fernando, Vidhyasaharan Sethu, Eliathamby Ambikairajah, "FACTORIZED HIDDEN VARIABILITY LEARNING FOR ADAPTATION OF SHORT DURATION LANGUAGE IDENTIFICATION MODELS", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2551. Accessed: Jul. 23, 2018.
@article{2551-18,
url = {http://sigport.org/2551},
author = {Sarith Fernando; Vidhyasaharan Sethu; Eliathamby Ambikairajah },
publisher = {IEEE SigPort},
title = {FACTORIZED HIDDEN VARIABILITY LEARNING FOR ADAPTATION OF SHORT DURATION LANGUAGE IDENTIFICATION MODELS},
year = {2018} }
TY - EJOUR
T1 - FACTORIZED HIDDEN VARIABILITY LEARNING FOR ADAPTATION OF SHORT DURATION LANGUAGE IDENTIFICATION MODELS
AU - Sarith Fernando; Vidhyasaharan Sethu; Eliathamby Ambikairajah
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2551
ER -
Sarith Fernando, Vidhyasaharan Sethu, Eliathamby Ambikairajah. (2018). FACTORIZED HIDDEN VARIABILITY LEARNING FOR ADAPTATION OF SHORT DURATION LANGUAGE IDENTIFICATION MODELS. IEEE SigPort. http://sigport.org/2551
Sarith Fernando, Vidhyasaharan Sethu, Eliathamby Ambikairajah, 2018. FACTORIZED HIDDEN VARIABILITY LEARNING FOR ADAPTATION OF SHORT DURATION LANGUAGE IDENTIFICATION MODELS. Available at: http://sigport.org/2551.
Sarith Fernando, Vidhyasaharan Sethu, Eliathamby Ambikairajah. (2018). "FACTORIZED HIDDEN VARIABILITY LEARNING FOR ADAPTATION OF SHORT DURATION LANGUAGE IDENTIFICATION MODELS." Web.
1. Sarith Fernando, Vidhyasaharan Sethu, Eliathamby Ambikairajah. FACTORIZED HIDDEN VARIABILITY LEARNING FOR ADAPTATION OF SHORT DURATION LANGUAGE IDENTIFICATION MODELS [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2551

High Order Recurrent Neural Networks for Acoustic Modelling


Vanishing long-term gradients are a major issue in training standard recurrent neural networks (RNNs), which can be alleviated by long short-term memory (LSTM) models with memory cells. However, the extra parameters associated with the memory cells mean an LSTM layer has four times as many parameters as an RNN with the same hidden vector size. This paper addresses the vanishing gradient problem using a high order RNN (HORNN) which has additional connections from multiple previous time steps.

Paper Details

Authors:
Chao Zhang, Phil Woodland
Submitted On:
12 April 2018 - 12:16pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

cz277-ICASSP18-Poster-v3.pdf

(50 downloads)

Subscribe

[1] Chao Zhang, Phil Woodland, "High Order Recurrent Neural Networks for Acoustic Modelling", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2429. Accessed: Jul. 23, 2018.
@article{2429-18,
url = {http://sigport.org/2429},
author = {Chao Zhang; Phil Woodland },
publisher = {IEEE SigPort},
title = {High Order Recurrent Neural Networks for Acoustic Modelling},
year = {2018} }
TY - EJOUR
T1 - High Order Recurrent Neural Networks for Acoustic Modelling
AU - Chao Zhang; Phil Woodland
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2429
ER -
Chao Zhang, Phil Woodland. (2018). High Order Recurrent Neural Networks for Acoustic Modelling. IEEE SigPort. http://sigport.org/2429
Chao Zhang, Phil Woodland, 2018. High Order Recurrent Neural Networks for Acoustic Modelling. Available at: http://sigport.org/2429.
Chao Zhang, Phil Woodland. (2018). "High Order Recurrent Neural Networks for Acoustic Modelling." Web.
1. Chao Zhang, Phil Woodland. High Order Recurrent Neural Networks for Acoustic Modelling [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2429

Mongolian Prosodic Phrase Prediction using Suffix Segmentation


Accurate prosodic phrase prediction can improve
the naturalness of speech synthesis. Predicting the prosodic
phrase can be regarded as a sequence labeling problem and
the Conditional Random Field (CRF) is typically used to
solve it. Mongolian is an agglutinative language, in which
massive words can be formed by concatenating these stems
and suffixes. This character makes it difficult to build a
Mongolian prosodic phrase predictions system, based on
CRF, that has high performance. We introduce a new

Paper Details

Authors:
Rui Liu, Feilong Bao, Guanglai Gao, Weihua Wang
Submitted On:
17 November 2016 - 8:27pm
Short Link:
Type:
Event:
Paper Code:
Document Year:
Cite

Document Files

222Mongolian Prosodic Phrase Prediction using Suffix Segmentation.pdf

(247 downloads)

Subscribe

[1] Rui Liu, Feilong Bao, Guanglai Gao, Weihua Wang, "Mongolian Prosodic Phrase Prediction using Suffix Segmentation", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1269. Accessed: Jul. 23, 2018.
@article{1269-16,
url = {http://sigport.org/1269},
author = {Rui Liu; Feilong Bao; Guanglai Gao; Weihua Wang },
publisher = {IEEE SigPort},
title = {Mongolian Prosodic Phrase Prediction using Suffix Segmentation},
year = {2016} }
TY - EJOUR
T1 - Mongolian Prosodic Phrase Prediction using Suffix Segmentation
AU - Rui Liu; Feilong Bao; Guanglai Gao; Weihua Wang
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1269
ER -
Rui Liu, Feilong Bao, Guanglai Gao, Weihua Wang. (2016). Mongolian Prosodic Phrase Prediction using Suffix Segmentation. IEEE SigPort. http://sigport.org/1269
Rui Liu, Feilong Bao, Guanglai Gao, Weihua Wang, 2016. Mongolian Prosodic Phrase Prediction using Suffix Segmentation. Available at: http://sigport.org/1269.
Rui Liu, Feilong Bao, Guanglai Gao, Weihua Wang. (2016). "Mongolian Prosodic Phrase Prediction using Suffix Segmentation." Web.
1. Rui Liu, Feilong Bao, Guanglai Gao, Weihua Wang. Mongolian Prosodic Phrase Prediction using Suffix Segmentation [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1269

Investigating Gated Recurrent Neural Networks for Acoustic Modeling

Paper Details

Authors:
Jie Li, Shuang Xu, Bo Xu
Submitted On:
15 October 2016 - 12:02pm
Short Link:
Type:
Document Year:
Cite

Document Files

Investigating Gated Recurrent Neural Networks for Acoustic Modeling_presentation.pdf

(242 downloads)

Subscribe

[1] Jie Li, Shuang Xu, Bo Xu, "Investigating Gated Recurrent Neural Networks for Acoustic Modeling", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1249. Accessed: Jul. 23, 2018.
@article{1249-16,
url = {http://sigport.org/1249},
author = {Jie Li; Shuang Xu; Bo Xu },
publisher = {IEEE SigPort},
title = {Investigating Gated Recurrent Neural Networks for Acoustic Modeling},
year = {2016} }
TY - EJOUR
T1 - Investigating Gated Recurrent Neural Networks for Acoustic Modeling
AU - Jie Li; Shuang Xu; Bo Xu
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1249
ER -
Jie Li, Shuang Xu, Bo Xu. (2016). Investigating Gated Recurrent Neural Networks for Acoustic Modeling. IEEE SigPort. http://sigport.org/1249
Jie Li, Shuang Xu, Bo Xu, 2016. Investigating Gated Recurrent Neural Networks for Acoustic Modeling. Available at: http://sigport.org/1249.
Jie Li, Shuang Xu, Bo Xu. (2016). "Investigating Gated Recurrent Neural Networks for Acoustic Modeling." Web.
1. Jie Li, Shuang Xu, Bo Xu. Investigating Gated Recurrent Neural Networks for Acoustic Modeling [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1249

Evaluation of a multimodal 3-d pronunciation tutor for learning Mandarin as a second language: an eye-tracking study

Paper Details

Authors:
Ying Zhou,Fei Chen, Hui Chen,Nan Yan
Submitted On:
16 October 2016 - 1:06am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Eyetracking PPT.pdf

(240 downloads)

Subscribe

[1] Ying Zhou,Fei Chen, Hui Chen,Nan Yan, "Evaluation of a multimodal 3-d pronunciation tutor for learning Mandarin as a second language: an eye-tracking study", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1248. Accessed: Jul. 23, 2018.
@article{1248-16,
url = {http://sigport.org/1248},
author = {Ying Zhou;Fei Chen; Hui Chen;Nan Yan },
publisher = {IEEE SigPort},
title = {Evaluation of a multimodal 3-d pronunciation tutor for learning Mandarin as a second language: an eye-tracking study},
year = {2016} }
TY - EJOUR
T1 - Evaluation of a multimodal 3-d pronunciation tutor for learning Mandarin as a second language: an eye-tracking study
AU - Ying Zhou;Fei Chen; Hui Chen;Nan Yan
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1248
ER -
Ying Zhou,Fei Chen, Hui Chen,Nan Yan. (2016). Evaluation of a multimodal 3-d pronunciation tutor for learning Mandarin as a second language: an eye-tracking study. IEEE SigPort. http://sigport.org/1248
Ying Zhou,Fei Chen, Hui Chen,Nan Yan, 2016. Evaluation of a multimodal 3-d pronunciation tutor for learning Mandarin as a second language: an eye-tracking study. Available at: http://sigport.org/1248.
Ying Zhou,Fei Chen, Hui Chen,Nan Yan. (2016). "Evaluation of a multimodal 3-d pronunciation tutor for learning Mandarin as a second language: an eye-tracking study." Web.
1. Ying Zhou,Fei Chen, Hui Chen,Nan Yan. Evaluation of a multimodal 3-d pronunciation tutor for learning Mandarin as a second language: an eye-tracking study [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1248

Improving Mandarin Tone Recognition Based on DNN by Combining Acoustic and Articulatory Features

Paper Details

Authors:
Yanlu Xie, Yingming Gao, Jinsong Zhang
Submitted On:
15 October 2016 - 11:24am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

toneRecognition.pptx

(315 downloads)

Subscribe

[1] Yanlu Xie, Yingming Gao, Jinsong Zhang, "Improving Mandarin Tone Recognition Based on DNN by Combining Acoustic and Articulatory Features", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1245. Accessed: Jul. 23, 2018.
@article{1245-16,
url = {http://sigport.org/1245},
author = {Yanlu Xie; Yingming Gao; Jinsong Zhang },
publisher = {IEEE SigPort},
title = {Improving Mandarin Tone Recognition Based on DNN by Combining Acoustic and Articulatory Features},
year = {2016} }
TY - EJOUR
T1 - Improving Mandarin Tone Recognition Based on DNN by Combining Acoustic and Articulatory Features
AU - Yanlu Xie; Yingming Gao; Jinsong Zhang
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1245
ER -
Yanlu Xie, Yingming Gao, Jinsong Zhang. (2016). Improving Mandarin Tone Recognition Based on DNN by Combining Acoustic and Articulatory Features. IEEE SigPort. http://sigport.org/1245
Yanlu Xie, Yingming Gao, Jinsong Zhang, 2016. Improving Mandarin Tone Recognition Based on DNN by Combining Acoustic and Articulatory Features. Available at: http://sigport.org/1245.
Yanlu Xie, Yingming Gao, Jinsong Zhang. (2016). "Improving Mandarin Tone Recognition Based on DNN by Combining Acoustic and Articulatory Features." Web.
1. Yanlu Xie, Yingming Gao, Jinsong Zhang. Improving Mandarin Tone Recognition Based on DNN by Combining Acoustic and Articulatory Features [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1245

Evaluation of a multimodal 3-d pronunciation tutor for learning Mandarin as a second language: an eye-tracking study

Paper Details

Authors:
Fei Chen, Hui Chen, Lan Wang, Nan Yan
Submitted On:
16 October 2016 - 1:06am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Eyetracking PPT.ppt

(264 downloads)

Subscribe

[1] Fei Chen, Hui Chen, Lan Wang, Nan Yan, "Evaluation of a multimodal 3-d pronunciation tutor for learning Mandarin as a second language: an eye-tracking study", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1244. Accessed: Jul. 23, 2018.
@article{1244-16,
url = {http://sigport.org/1244},
author = {Fei Chen; Hui Chen; Lan Wang; Nan Yan },
publisher = {IEEE SigPort},
title = {Evaluation of a multimodal 3-d pronunciation tutor for learning Mandarin as a second language: an eye-tracking study},
year = {2016} }
TY - EJOUR
T1 - Evaluation of a multimodal 3-d pronunciation tutor for learning Mandarin as a second language: an eye-tracking study
AU - Fei Chen; Hui Chen; Lan Wang; Nan Yan
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1244
ER -
Fei Chen, Hui Chen, Lan Wang, Nan Yan. (2016). Evaluation of a multimodal 3-d pronunciation tutor for learning Mandarin as a second language: an eye-tracking study. IEEE SigPort. http://sigport.org/1244
Fei Chen, Hui Chen, Lan Wang, Nan Yan, 2016. Evaluation of a multimodal 3-d pronunciation tutor for learning Mandarin as a second language: an eye-tracking study. Available at: http://sigport.org/1244.
Fei Chen, Hui Chen, Lan Wang, Nan Yan. (2016). "Evaluation of a multimodal 3-d pronunciation tutor for learning Mandarin as a second language: an eye-tracking study." Web.
1. Fei Chen, Hui Chen, Lan Wang, Nan Yan. Evaluation of a multimodal 3-d pronunciation tutor for learning Mandarin as a second language: an eye-tracking study [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1244

Automatic Mandarin Prosody Boundary Detecting Based on Tone Nucleus Features and DNN Model

Paper Details

Authors:
Yanlu Xie, Wei Zhang, Jinsong Zhang
Submitted On:
15 October 2016 - 11:19am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

ISCSLP2016_prosodyDetection.pdf

(304 downloads)

Subscribe

[1] Yanlu Xie, Wei Zhang, Jinsong Zhang, "Automatic Mandarin Prosody Boundary Detecting Based on Tone Nucleus Features and DNN Model", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1243. Accessed: Jul. 23, 2018.
@article{1243-16,
url = {http://sigport.org/1243},
author = {Yanlu Xie; Wei Zhang; Jinsong Zhang },
publisher = {IEEE SigPort},
title = {Automatic Mandarin Prosody Boundary Detecting Based on Tone Nucleus Features and DNN Model},
year = {2016} }
TY - EJOUR
T1 - Automatic Mandarin Prosody Boundary Detecting Based on Tone Nucleus Features and DNN Model
AU - Yanlu Xie; Wei Zhang; Jinsong Zhang
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1243
ER -
Yanlu Xie, Wei Zhang, Jinsong Zhang. (2016). Automatic Mandarin Prosody Boundary Detecting Based on Tone Nucleus Features and DNN Model. IEEE SigPort. http://sigport.org/1243
Yanlu Xie, Wei Zhang, Jinsong Zhang, 2016. Automatic Mandarin Prosody Boundary Detecting Based on Tone Nucleus Features and DNN Model. Available at: http://sigport.org/1243.
Yanlu Xie, Wei Zhang, Jinsong Zhang. (2016). "Automatic Mandarin Prosody Boundary Detecting Based on Tone Nucleus Features and DNN Model." Web.
1. Yanlu Xie, Wei Zhang, Jinsong Zhang. Automatic Mandarin Prosody Boundary Detecting Based on Tone Nucleus Features and DNN Model [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1243

Pages