Sorry, you need to enable JavaScript to visit this website.

Spoken Language Processing

AttS2S-VC: Sequence-to-Sequence Voice Conversion with Attention and Context Preservation Mechanisms


This paper describes a method based on a sequence-to-sequence learning (Seq2Seq) with attention and context preservation mechanism for voice conversion (VC) tasks. Seq2Seq has been outstanding at numerous tasks involving sequence modeling such as speech synthesis and recognition, machine translation, and image captioning.

Paper Details

Authors:
Submitted On:
15 May 2019 - 7:03am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

2019_05_ICASSP_KouTanaka.pdf

(47)

Subscribe

[1] , "AttS2S-VC: Sequence-to-Sequence Voice Conversion with Attention and Context Preservation Mechanisms", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4522. Accessed: Sep. 20, 2019.
@article{4522-19,
url = {http://sigport.org/4522},
author = { },
publisher = {IEEE SigPort},
title = {AttS2S-VC: Sequence-to-Sequence Voice Conversion with Attention and Context Preservation Mechanisms},
year = {2019} }
TY - EJOUR
T1 - AttS2S-VC: Sequence-to-Sequence Voice Conversion with Attention and Context Preservation Mechanisms
AU -
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4522
ER -
. (2019). AttS2S-VC: Sequence-to-Sequence Voice Conversion with Attention and Context Preservation Mechanisms. IEEE SigPort. http://sigport.org/4522
, 2019. AttS2S-VC: Sequence-to-Sequence Voice Conversion with Attention and Context Preservation Mechanisms. Available at: http://sigport.org/4522.
. (2019). "AttS2S-VC: Sequence-to-Sequence Voice Conversion with Attention and Context Preservation Mechanisms." Web.
1. . AttS2S-VC: Sequence-to-Sequence Voice Conversion with Attention and Context Preservation Mechanisms [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4522

Towards Better Confidence Estimation for Neural Models

Paper Details

Authors:
Vishal Thanvantri Vasudevan, Abhinav Sethy, Alireza Roshan Ghias
Submitted On:
9 May 2019 - 11:01pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:

Document Files

conference_poster_5.pdf

(54)

Keywords

Additional Categories

Subscribe

[1] Vishal Thanvantri Vasudevan, Abhinav Sethy, Alireza Roshan Ghias, "Towards Better Confidence Estimation for Neural Models", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4258. Accessed: Sep. 20, 2019.
@article{4258-19,
url = {http://sigport.org/4258},
author = {Vishal Thanvantri Vasudevan; Abhinav Sethy; Alireza Roshan Ghias },
publisher = {IEEE SigPort},
title = {Towards Better Confidence Estimation for Neural Models},
year = {2019} }
TY - EJOUR
T1 - Towards Better Confidence Estimation for Neural Models
AU - Vishal Thanvantri Vasudevan; Abhinav Sethy; Alireza Roshan Ghias
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4258
ER -
Vishal Thanvantri Vasudevan, Abhinav Sethy, Alireza Roshan Ghias. (2019). Towards Better Confidence Estimation for Neural Models. IEEE SigPort. http://sigport.org/4258
Vishal Thanvantri Vasudevan, Abhinav Sethy, Alireza Roshan Ghias, 2019. Towards Better Confidence Estimation for Neural Models. Available at: http://sigport.org/4258.
Vishal Thanvantri Vasudevan, Abhinav Sethy, Alireza Roshan Ghias. (2019). "Towards Better Confidence Estimation for Neural Models." Web.
1. Vishal Thanvantri Vasudevan, Abhinav Sethy, Alireza Roshan Ghias. Towards Better Confidence Estimation for Neural Models [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4258

End-to-End Anchored Speech Recognition


Voice-controlled house-hold devices, like Amazon Echo or Google Home, face the problem of performing speech recognition of device- directed speech in the presence of interfering background speech, i.e., background noise and interfering speech from another person or media device in proximity need to be ignored. We propose two end-to-end models to tackle this problem with information extracted from the “anchored segment”.

Paper Details

Authors:
Yiming Wang, Xing Fan, I-Fan Chen, Yuzong Liu, Tongfei Chen, Björn Hoffmeister
Submitted On:
7 May 2019 - 2:33pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

ICASSP19_Poster_AnchoredSpeechRecogWithAttention.pdf

(48)

Subscribe

[1] Yiming Wang, Xing Fan, I-Fan Chen, Yuzong Liu, Tongfei Chen, Björn Hoffmeister, "End-to-End Anchored Speech Recognition", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/3943. Accessed: Sep. 20, 2019.
@article{3943-19,
url = {http://sigport.org/3943},
author = {Yiming Wang; Xing Fan; I-Fan Chen; Yuzong Liu; Tongfei Chen; Björn Hoffmeister },
publisher = {IEEE SigPort},
title = {End-to-End Anchored Speech Recognition},
year = {2019} }
TY - EJOUR
T1 - End-to-End Anchored Speech Recognition
AU - Yiming Wang; Xing Fan; I-Fan Chen; Yuzong Liu; Tongfei Chen; Björn Hoffmeister
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/3943
ER -
Yiming Wang, Xing Fan, I-Fan Chen, Yuzong Liu, Tongfei Chen, Björn Hoffmeister. (2019). End-to-End Anchored Speech Recognition. IEEE SigPort. http://sigport.org/3943
Yiming Wang, Xing Fan, I-Fan Chen, Yuzong Liu, Tongfei Chen, Björn Hoffmeister, 2019. End-to-End Anchored Speech Recognition. Available at: http://sigport.org/3943.
Yiming Wang, Xing Fan, I-Fan Chen, Yuzong Liu, Tongfei Chen, Björn Hoffmeister. (2019). "End-to-End Anchored Speech Recognition." Web.
1. Yiming Wang, Xing Fan, I-Fan Chen, Yuzong Liu, Tongfei Chen, Björn Hoffmeister. End-to-End Anchored Speech Recognition [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/3943

Robust Spoken Language Understanding with unsupervised ASR-error adaptation


Robustness to errors produced by automatic speech recognition (ASR) is essential for Spoken Language Understanding (SLU). Traditional robust SLU typically needs ASR hypotheses with semantic annotations for training. However, semantic annotation is very expensive, and the corresponding ASR system may change frequently. Here, we propose a novel unsupervised ASR-error adaptation method, obviating the need of annotated ASR hypotheses.

Paper Details

Authors:
Su Zhu, Ouyu Lan, Kai Yu
Submitted On:
19 April 2018 - 3:58pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

zhu-icassp18-poster.pdf

(252)

Subscribe

[1] Su Zhu, Ouyu Lan, Kai Yu, "Robust Spoken Language Understanding with unsupervised ASR-error adaptation", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/3016. Accessed: Sep. 20, 2019.
@article{3016-18,
url = {http://sigport.org/3016},
author = {Su Zhu; Ouyu Lan; Kai Yu },
publisher = {IEEE SigPort},
title = {Robust Spoken Language Understanding with unsupervised ASR-error adaptation},
year = {2018} }
TY - EJOUR
T1 - Robust Spoken Language Understanding with unsupervised ASR-error adaptation
AU - Su Zhu; Ouyu Lan; Kai Yu
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/3016
ER -
Su Zhu, Ouyu Lan, Kai Yu. (2018). Robust Spoken Language Understanding with unsupervised ASR-error adaptation. IEEE SigPort. http://sigport.org/3016
Su Zhu, Ouyu Lan, Kai Yu, 2018. Robust Spoken Language Understanding with unsupervised ASR-error adaptation. Available at: http://sigport.org/3016.
Su Zhu, Ouyu Lan, Kai Yu. (2018). "Robust Spoken Language Understanding with unsupervised ASR-error adaptation." Web.
1. Su Zhu, Ouyu Lan, Kai Yu. Robust Spoken Language Understanding with unsupervised ASR-error adaptation [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/3016

DEEP MULTIMODAL LEARNING FOR EMOTION RECOGNITION IN SPOKEN LANGUAGE


In this paper, we present a novel deep multimodal framework to predict human emotions based on sentence-level spoken language. Our architecture has two distinctive characteristics. First, it extracts the high-level features from both text and audio via a hybrid deep multimodal structure, which considers the spatial information from text, temporal information from audio, and high-level associations from low-level handcrafted features.

Paper Details

Authors:
Yue Gu, Shuhong Chen, Ivan Marsic
Submitted On:
13 April 2018 - 3:30pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

ICASSP_2018_POSTER.pdf

(383)

Subscribe

[1] Yue Gu, Shuhong Chen, Ivan Marsic, "DEEP MULTIMODAL LEARNING FOR EMOTION RECOGNITION IN SPOKEN LANGUAGE", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2752. Accessed: Sep. 20, 2019.
@article{2752-18,
url = {http://sigport.org/2752},
author = {Yue Gu; Shuhong Chen; Ivan Marsic },
publisher = {IEEE SigPort},
title = {DEEP MULTIMODAL LEARNING FOR EMOTION RECOGNITION IN SPOKEN LANGUAGE},
year = {2018} }
TY - EJOUR
T1 - DEEP MULTIMODAL LEARNING FOR EMOTION RECOGNITION IN SPOKEN LANGUAGE
AU - Yue Gu; Shuhong Chen; Ivan Marsic
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2752
ER -
Yue Gu, Shuhong Chen, Ivan Marsic. (2018). DEEP MULTIMODAL LEARNING FOR EMOTION RECOGNITION IN SPOKEN LANGUAGE. IEEE SigPort. http://sigport.org/2752
Yue Gu, Shuhong Chen, Ivan Marsic, 2018. DEEP MULTIMODAL LEARNING FOR EMOTION RECOGNITION IN SPOKEN LANGUAGE. Available at: http://sigport.org/2752.
Yue Gu, Shuhong Chen, Ivan Marsic. (2018). "DEEP MULTIMODAL LEARNING FOR EMOTION RECOGNITION IN SPOKEN LANGUAGE." Web.
1. Yue Gu, Shuhong Chen, Ivan Marsic. DEEP MULTIMODAL LEARNING FOR EMOTION RECOGNITION IN SPOKEN LANGUAGE [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2752

FACTORIZED HIDDEN VARIABILITY LEARNING FOR ADAPTATION OF SHORT DURATION LANGUAGE IDENTIFICATION MODELS


Bidirectional long short term memory (BLSTM) recurrent neural networks (RNNs) have recently outperformed other state-of-the-art approaches, such as i-vector and deep neural networks (DNNs) in automatic language identification (LID), particularly when testing with very short utterances (∼3s). Mismatches conditions between training and test data, e.g. speaker, channel, duration and environmental noise, are a major source of performance degradation for LID.

Paper Details

Authors:
Sarith Fernando, Vidhyasaharan Sethu, Eliathamby Ambikairajah
Submitted On:
12 April 2018 - 9:48pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

POSTER.pdf

(1254)

Subscribe

[1] Sarith Fernando, Vidhyasaharan Sethu, Eliathamby Ambikairajah, "FACTORIZED HIDDEN VARIABILITY LEARNING FOR ADAPTATION OF SHORT DURATION LANGUAGE IDENTIFICATION MODELS", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2551. Accessed: Sep. 20, 2019.
@article{2551-18,
url = {http://sigport.org/2551},
author = {Sarith Fernando; Vidhyasaharan Sethu; Eliathamby Ambikairajah },
publisher = {IEEE SigPort},
title = {FACTORIZED HIDDEN VARIABILITY LEARNING FOR ADAPTATION OF SHORT DURATION LANGUAGE IDENTIFICATION MODELS},
year = {2018} }
TY - EJOUR
T1 - FACTORIZED HIDDEN VARIABILITY LEARNING FOR ADAPTATION OF SHORT DURATION LANGUAGE IDENTIFICATION MODELS
AU - Sarith Fernando; Vidhyasaharan Sethu; Eliathamby Ambikairajah
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2551
ER -
Sarith Fernando, Vidhyasaharan Sethu, Eliathamby Ambikairajah. (2018). FACTORIZED HIDDEN VARIABILITY LEARNING FOR ADAPTATION OF SHORT DURATION LANGUAGE IDENTIFICATION MODELS. IEEE SigPort. http://sigport.org/2551
Sarith Fernando, Vidhyasaharan Sethu, Eliathamby Ambikairajah, 2018. FACTORIZED HIDDEN VARIABILITY LEARNING FOR ADAPTATION OF SHORT DURATION LANGUAGE IDENTIFICATION MODELS. Available at: http://sigport.org/2551.
Sarith Fernando, Vidhyasaharan Sethu, Eliathamby Ambikairajah. (2018). "FACTORIZED HIDDEN VARIABILITY LEARNING FOR ADAPTATION OF SHORT DURATION LANGUAGE IDENTIFICATION MODELS." Web.
1. Sarith Fernando, Vidhyasaharan Sethu, Eliathamby Ambikairajah. FACTORIZED HIDDEN VARIABILITY LEARNING FOR ADAPTATION OF SHORT DURATION LANGUAGE IDENTIFICATION MODELS [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2551

High Order Recurrent Neural Networks for Acoustic Modelling


Vanishing long-term gradients are a major issue in training standard recurrent neural networks (RNNs), which can be alleviated by long short-term memory (LSTM) models with memory cells. However, the extra parameters associated with the memory cells mean an LSTM layer has four times as many parameters as an RNN with the same hidden vector size. This paper addresses the vanishing gradient problem using a high order RNN (HORNN) which has additional connections from multiple previous time steps.

Paper Details

Authors:
Chao Zhang, Phil Woodland
Submitted On:
12 April 2018 - 12:16pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

cz277-ICASSP18-Poster-v3.pdf

(176)

Subscribe

[1] Chao Zhang, Phil Woodland, "High Order Recurrent Neural Networks for Acoustic Modelling", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2429. Accessed: Sep. 20, 2019.
@article{2429-18,
url = {http://sigport.org/2429},
author = {Chao Zhang; Phil Woodland },
publisher = {IEEE SigPort},
title = {High Order Recurrent Neural Networks for Acoustic Modelling},
year = {2018} }
TY - EJOUR
T1 - High Order Recurrent Neural Networks for Acoustic Modelling
AU - Chao Zhang; Phil Woodland
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2429
ER -
Chao Zhang, Phil Woodland. (2018). High Order Recurrent Neural Networks for Acoustic Modelling. IEEE SigPort. http://sigport.org/2429
Chao Zhang, Phil Woodland, 2018. High Order Recurrent Neural Networks for Acoustic Modelling. Available at: http://sigport.org/2429.
Chao Zhang, Phil Woodland. (2018). "High Order Recurrent Neural Networks for Acoustic Modelling." Web.
1. Chao Zhang, Phil Woodland. High Order Recurrent Neural Networks for Acoustic Modelling [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2429

Mongolian Prosodic Phrase Prediction using Suffix Segmentation


Accurate prosodic phrase prediction can improve
the naturalness of speech synthesis. Predicting the prosodic
phrase can be regarded as a sequence labeling problem and
the Conditional Random Field (CRF) is typically used to
solve it. Mongolian is an agglutinative language, in which
massive words can be formed by concatenating these stems
and suffixes. This character makes it difficult to build a
Mongolian prosodic phrase predictions system, based on
CRF, that has high performance. We introduce a new

Paper Details

Authors:
Rui Liu, Feilong Bao, Guanglai Gao, Weihua Wang
Submitted On:
17 November 2016 - 8:27pm
Short Link:
Type:
Event:
Paper Code:
Document Year:
Cite

Document Files

222Mongolian Prosodic Phrase Prediction using Suffix Segmentation.pdf

(82)

Subscribe

[1] Rui Liu, Feilong Bao, Guanglai Gao, Weihua Wang, "Mongolian Prosodic Phrase Prediction using Suffix Segmentation", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1269. Accessed: Sep. 20, 2019.
@article{1269-16,
url = {http://sigport.org/1269},
author = {Rui Liu; Feilong Bao; Guanglai Gao; Weihua Wang },
publisher = {IEEE SigPort},
title = {Mongolian Prosodic Phrase Prediction using Suffix Segmentation},
year = {2016} }
TY - EJOUR
T1 - Mongolian Prosodic Phrase Prediction using Suffix Segmentation
AU - Rui Liu; Feilong Bao; Guanglai Gao; Weihua Wang
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1269
ER -
Rui Liu, Feilong Bao, Guanglai Gao, Weihua Wang. (2016). Mongolian Prosodic Phrase Prediction using Suffix Segmentation. IEEE SigPort. http://sigport.org/1269
Rui Liu, Feilong Bao, Guanglai Gao, Weihua Wang, 2016. Mongolian Prosodic Phrase Prediction using Suffix Segmentation. Available at: http://sigport.org/1269.
Rui Liu, Feilong Bao, Guanglai Gao, Weihua Wang. (2016). "Mongolian Prosodic Phrase Prediction using Suffix Segmentation." Web.
1. Rui Liu, Feilong Bao, Guanglai Gao, Weihua Wang. Mongolian Prosodic Phrase Prediction using Suffix Segmentation [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1269

Investigating Gated Recurrent Neural Networks for Acoustic Modeling

Paper Details

Authors:
Jie Li, Shuang Xu, Bo Xu
Submitted On:
15 October 2016 - 12:02pm
Short Link:
Type:
Document Year:
Cite

Document Files

Investigating Gated Recurrent Neural Networks for Acoustic Modeling_presentation.pdf

(85)

Subscribe

[1] Jie Li, Shuang Xu, Bo Xu, "Investigating Gated Recurrent Neural Networks for Acoustic Modeling", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1249. Accessed: Sep. 20, 2019.
@article{1249-16,
url = {http://sigport.org/1249},
author = {Jie Li; Shuang Xu; Bo Xu },
publisher = {IEEE SigPort},
title = {Investigating Gated Recurrent Neural Networks for Acoustic Modeling},
year = {2016} }
TY - EJOUR
T1 - Investigating Gated Recurrent Neural Networks for Acoustic Modeling
AU - Jie Li; Shuang Xu; Bo Xu
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1249
ER -
Jie Li, Shuang Xu, Bo Xu. (2016). Investigating Gated Recurrent Neural Networks for Acoustic Modeling. IEEE SigPort. http://sigport.org/1249
Jie Li, Shuang Xu, Bo Xu, 2016. Investigating Gated Recurrent Neural Networks for Acoustic Modeling. Available at: http://sigport.org/1249.
Jie Li, Shuang Xu, Bo Xu. (2016). "Investigating Gated Recurrent Neural Networks for Acoustic Modeling." Web.
1. Jie Li, Shuang Xu, Bo Xu. Investigating Gated Recurrent Neural Networks for Acoustic Modeling [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1249

Evaluation of a multimodal 3-d pronunciation tutor for learning Mandarin as a second language: an eye-tracking study

Paper Details

Authors:
Ying Zhou,Fei Chen, Hui Chen,Nan Yan
Submitted On:
16 October 2016 - 1:06am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Eyetracking PPT.pdf

(70)

Subscribe

[1] Ying Zhou,Fei Chen, Hui Chen,Nan Yan, "Evaluation of a multimodal 3-d pronunciation tutor for learning Mandarin as a second language: an eye-tracking study", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1248. Accessed: Sep. 20, 2019.
@article{1248-16,
url = {http://sigport.org/1248},
author = {Ying Zhou;Fei Chen; Hui Chen;Nan Yan },
publisher = {IEEE SigPort},
title = {Evaluation of a multimodal 3-d pronunciation tutor for learning Mandarin as a second language: an eye-tracking study},
year = {2016} }
TY - EJOUR
T1 - Evaluation of a multimodal 3-d pronunciation tutor for learning Mandarin as a second language: an eye-tracking study
AU - Ying Zhou;Fei Chen; Hui Chen;Nan Yan
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1248
ER -
Ying Zhou,Fei Chen, Hui Chen,Nan Yan. (2016). Evaluation of a multimodal 3-d pronunciation tutor for learning Mandarin as a second language: an eye-tracking study. IEEE SigPort. http://sigport.org/1248
Ying Zhou,Fei Chen, Hui Chen,Nan Yan, 2016. Evaluation of a multimodal 3-d pronunciation tutor for learning Mandarin as a second language: an eye-tracking study. Available at: http://sigport.org/1248.
Ying Zhou,Fei Chen, Hui Chen,Nan Yan. (2016). "Evaluation of a multimodal 3-d pronunciation tutor for learning Mandarin as a second language: an eye-tracking study." Web.
1. Ying Zhou,Fei Chen, Hui Chen,Nan Yan. Evaluation of a multimodal 3-d pronunciation tutor for learning Mandarin as a second language: an eye-tracking study [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1248

Pages