Sorry, you need to enable JavaScript to visit this website.

Speech Processing

AUTOMATIC SPEECH ASSESSMENT FOR APHASIC PATIENTS BASED ON SYLLABLE-LEVEL EMBEDDING AND SUPRA-SEGMENTAL DURATION FEATURES


Aphasia is a type of acquired language impairment resulting from brain injury. Speech assessment is an important part of the comprehensive assessment process for aphasic patients. It is based on the acoustical and linguistic analysis of patients’ speech elicited through pre-defined story-telling tasks. This type of narrative spontaneous speech embodies multi-fold atypical characteristics related to the underlying language impairment.

Paper Details

Authors:
Tan Lee, Anthony Pak Hin Kong
Submitted On:
12 April 2018 - 11:52pm
Short Link:
Type:
Event:
Presenter's Name:
Document Year:
Cite

Document Files

poster_QinYing_ICASSP2018_final.pdf

(184)

Subscribe

[1] Tan Lee, Anthony Pak Hin Kong, "AUTOMATIC SPEECH ASSESSMENT FOR APHASIC PATIENTS BASED ON SYLLABLE-LEVEL EMBEDDING AND SUPRA-SEGMENTAL DURATION FEATURES", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2571. Accessed: Jul. 17, 2019.
@article{2571-18,
url = {http://sigport.org/2571},
author = {Tan Lee; Anthony Pak Hin Kong },
publisher = {IEEE SigPort},
title = {AUTOMATIC SPEECH ASSESSMENT FOR APHASIC PATIENTS BASED ON SYLLABLE-LEVEL EMBEDDING AND SUPRA-SEGMENTAL DURATION FEATURES},
year = {2018} }
TY - EJOUR
T1 - AUTOMATIC SPEECH ASSESSMENT FOR APHASIC PATIENTS BASED ON SYLLABLE-LEVEL EMBEDDING AND SUPRA-SEGMENTAL DURATION FEATURES
AU - Tan Lee; Anthony Pak Hin Kong
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2571
ER -
Tan Lee, Anthony Pak Hin Kong. (2018). AUTOMATIC SPEECH ASSESSMENT FOR APHASIC PATIENTS BASED ON SYLLABLE-LEVEL EMBEDDING AND SUPRA-SEGMENTAL DURATION FEATURES. IEEE SigPort. http://sigport.org/2571
Tan Lee, Anthony Pak Hin Kong, 2018. AUTOMATIC SPEECH ASSESSMENT FOR APHASIC PATIENTS BASED ON SYLLABLE-LEVEL EMBEDDING AND SUPRA-SEGMENTAL DURATION FEATURES. Available at: http://sigport.org/2571.
Tan Lee, Anthony Pak Hin Kong. (2018). "AUTOMATIC SPEECH ASSESSMENT FOR APHASIC PATIENTS BASED ON SYLLABLE-LEVEL EMBEDDING AND SUPRA-SEGMENTAL DURATION FEATURES." Web.
1. Tan Lee, Anthony Pak Hin Kong. AUTOMATIC SPEECH ASSESSMENT FOR APHASIC PATIENTS BASED ON SYLLABLE-LEVEL EMBEDDING AND SUPRA-SEGMENTAL DURATION FEATURES [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2571

FACTORIZED HIDDEN VARIABILITY LEARNING FOR ADAPTATION OF SHORT DURATION LANGUAGE IDENTIFICATION MODELS


Bidirectional long short term memory (BLSTM) recurrent neural networks (RNNs) have recently outperformed other state-of-the-art approaches, such as i-vector and deep neural networks (DNNs) in automatic language identification (LID), particularly when testing with very short utterances (∼3s). Mismatches conditions between training and test data, e.g. speaker, channel, duration and environmental noise, are a major source of performance degradation for LID.

Paper Details

Authors:
Sarith Fernando, Vidhyasaharan Sethu, Eliathamby Ambikairajah
Submitted On:
12 April 2018 - 9:48pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

POSTER.pdf

(1244)

Subscribe

[1] Sarith Fernando, Vidhyasaharan Sethu, Eliathamby Ambikairajah, "FACTORIZED HIDDEN VARIABILITY LEARNING FOR ADAPTATION OF SHORT DURATION LANGUAGE IDENTIFICATION MODELS", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2551. Accessed: Jul. 17, 2019.
@article{2551-18,
url = {http://sigport.org/2551},
author = {Sarith Fernando; Vidhyasaharan Sethu; Eliathamby Ambikairajah },
publisher = {IEEE SigPort},
title = {FACTORIZED HIDDEN VARIABILITY LEARNING FOR ADAPTATION OF SHORT DURATION LANGUAGE IDENTIFICATION MODELS},
year = {2018} }
TY - EJOUR
T1 - FACTORIZED HIDDEN VARIABILITY LEARNING FOR ADAPTATION OF SHORT DURATION LANGUAGE IDENTIFICATION MODELS
AU - Sarith Fernando; Vidhyasaharan Sethu; Eliathamby Ambikairajah
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2551
ER -
Sarith Fernando, Vidhyasaharan Sethu, Eliathamby Ambikairajah. (2018). FACTORIZED HIDDEN VARIABILITY LEARNING FOR ADAPTATION OF SHORT DURATION LANGUAGE IDENTIFICATION MODELS. IEEE SigPort. http://sigport.org/2551
Sarith Fernando, Vidhyasaharan Sethu, Eliathamby Ambikairajah, 2018. FACTORIZED HIDDEN VARIABILITY LEARNING FOR ADAPTATION OF SHORT DURATION LANGUAGE IDENTIFICATION MODELS. Available at: http://sigport.org/2551.
Sarith Fernando, Vidhyasaharan Sethu, Eliathamby Ambikairajah. (2018). "FACTORIZED HIDDEN VARIABILITY LEARNING FOR ADAPTATION OF SHORT DURATION LANGUAGE IDENTIFICATION MODELS." Web.
1. Sarith Fernando, Vidhyasaharan Sethu, Eliathamby Ambikairajah. FACTORIZED HIDDEN VARIABILITY LEARNING FOR ADAPTATION OF SHORT DURATION LANGUAGE IDENTIFICATION MODELS [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2551

Speaker-Invariant Training via Adversarial Learning


We propose a novel adversarial multi-task learning scheme, aiming at actively curtailing the inter-talker feature variability while maximizing its senone discriminability so as to enhance the performance of a deep neural network (DNN) based ASR system. We call the scheme speaker-invariant training (SIT). In SIT, a DNN acoustic model and a speaker classifier network are jointly optimized to minimize the senone (tied triphone state) classification loss, and simultaneously mini-maximize the speaker classification loss.

Paper Details

Authors:
Zhong Meng, Jinyu Li, Zhuo Chen, Yong Zhao, Vadim Mazalov, Yifan Gong, Biing-Hwang (Fred) Juang
Submitted On:
12 May 2019 - 9:29pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

sit_poster.pptx

(145)

Subscribe

[1] Zhong Meng, Jinyu Li, Zhuo Chen, Yong Zhao, Vadim Mazalov, Yifan Gong, Biing-Hwang (Fred) Juang, "Speaker-Invariant Training via Adversarial Learning", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2508. Accessed: Jul. 17, 2019.
@article{2508-18,
url = {http://sigport.org/2508},
author = {Zhong Meng; Jinyu Li; Zhuo Chen; Yong Zhao; Vadim Mazalov; Yifan Gong; Biing-Hwang (Fred) Juang },
publisher = {IEEE SigPort},
title = {Speaker-Invariant Training via Adversarial Learning},
year = {2018} }
TY - EJOUR
T1 - Speaker-Invariant Training via Adversarial Learning
AU - Zhong Meng; Jinyu Li; Zhuo Chen; Yong Zhao; Vadim Mazalov; Yifan Gong; Biing-Hwang (Fred) Juang
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2508
ER -
Zhong Meng, Jinyu Li, Zhuo Chen, Yong Zhao, Vadim Mazalov, Yifan Gong, Biing-Hwang (Fred) Juang. (2018). Speaker-Invariant Training via Adversarial Learning. IEEE SigPort. http://sigport.org/2508
Zhong Meng, Jinyu Li, Zhuo Chen, Yong Zhao, Vadim Mazalov, Yifan Gong, Biing-Hwang (Fred) Juang, 2018. Speaker-Invariant Training via Adversarial Learning. Available at: http://sigport.org/2508.
Zhong Meng, Jinyu Li, Zhuo Chen, Yong Zhao, Vadim Mazalov, Yifan Gong, Biing-Hwang (Fred) Juang. (2018). "Speaker-Invariant Training via Adversarial Learning." Web.
1. Zhong Meng, Jinyu Li, Zhuo Chen, Yong Zhao, Vadim Mazalov, Yifan Gong, Biing-Hwang (Fred) Juang. Speaker-Invariant Training via Adversarial Learning [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2508

Adversarial Teacher-Student Learning for Unsupervised Adaptation


The teacher-student (T/S) learning has been shown effective in unsupervised domain adaptation ts_adapt. It is a form of transfer learning, not in terms of the transfer of recognition decisions, but the knowledge of posteriori probabilities in the source domain as evaluated by the teacher model. It learns to handle the speaker and environment variability inherent in and restricted to the speech signal in the target domain without proactively addressing the robustness to other likely conditions. Performance degradation may thus ensue.

Paper Details

Authors:
Zhong Meng, Jinyu Li, Yifan Gong, Biing-Hwang (Fred) Juang
Submitted On:
12 May 2019 - 9:31pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

ats_poster_v2.pptx

(95)

Subscribe

[1] Zhong Meng, Jinyu Li, Yifan Gong, Biing-Hwang (Fred) Juang, "Adversarial Teacher-Student Learning for Unsupervised Adaptation", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2506. Accessed: Jul. 17, 2019.
@article{2506-18,
url = {http://sigport.org/2506},
author = {Zhong Meng; Jinyu Li; Yifan Gong; Biing-Hwang (Fred) Juang },
publisher = {IEEE SigPort},
title = {Adversarial Teacher-Student Learning for Unsupervised Adaptation},
year = {2018} }
TY - EJOUR
T1 - Adversarial Teacher-Student Learning for Unsupervised Adaptation
AU - Zhong Meng; Jinyu Li; Yifan Gong; Biing-Hwang (Fred) Juang
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2506
ER -
Zhong Meng, Jinyu Li, Yifan Gong, Biing-Hwang (Fred) Juang. (2018). Adversarial Teacher-Student Learning for Unsupervised Adaptation. IEEE SigPort. http://sigport.org/2506
Zhong Meng, Jinyu Li, Yifan Gong, Biing-Hwang (Fred) Juang, 2018. Adversarial Teacher-Student Learning for Unsupervised Adaptation. Available at: http://sigport.org/2506.
Zhong Meng, Jinyu Li, Yifan Gong, Biing-Hwang (Fred) Juang. (2018). "Adversarial Teacher-Student Learning for Unsupervised Adaptation." Web.
1. Zhong Meng, Jinyu Li, Yifan Gong, Biing-Hwang (Fred) Juang. Adversarial Teacher-Student Learning for Unsupervised Adaptation [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2506

A PLLR and Multi-stage Staircase Regression Framework for Speech-based Emotion Prediction


Continuous prediction of dimensional emotions (e.g. arousal and valence) has attracted increasing research interest recently. When processing emotional speech signals, phonetic features have been rarely used due to the assumption that phonetic variability is a confounding factor that degrades emotion recognition/prediction performance. In this paper, instead of eliminating phonetic variability, we investigated whether Phone Log-Likelihood Ratio (PLLR) features could be used to index arousal and valence in a pairwise low/high framework.

Paper Details

Authors:
Zhaocheng Huang, Julien Epps
Submitted On:
17 March 2017 - 10:17pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

DAVID_ICASSP2017_V1.pdf

(283)

Subscribe

[1] Zhaocheng Huang, Julien Epps, "A PLLR and Multi-stage Staircase Regression Framework for Speech-based Emotion Prediction", IEEE SigPort, 2017. [Online]. Available: http://sigport.org/1775. Accessed: Jul. 17, 2019.
@article{1775-17,
url = {http://sigport.org/1775},
author = {Zhaocheng Huang; Julien Epps },
publisher = {IEEE SigPort},
title = {A PLLR and Multi-stage Staircase Regression Framework for Speech-based Emotion Prediction},
year = {2017} }
TY - EJOUR
T1 - A PLLR and Multi-stage Staircase Regression Framework for Speech-based Emotion Prediction
AU - Zhaocheng Huang; Julien Epps
PY - 2017
PB - IEEE SigPort
UR - http://sigport.org/1775
ER -
Zhaocheng Huang, Julien Epps. (2017). A PLLR and Multi-stage Staircase Regression Framework for Speech-based Emotion Prediction. IEEE SigPort. http://sigport.org/1775
Zhaocheng Huang, Julien Epps, 2017. A PLLR and Multi-stage Staircase Regression Framework for Speech-based Emotion Prediction. Available at: http://sigport.org/1775.
Zhaocheng Huang, Julien Epps. (2017). "A PLLR and Multi-stage Staircase Regression Framework for Speech-based Emotion Prediction." Web.
1. Zhaocheng Huang, Julien Epps. A PLLR and Multi-stage Staircase Regression Framework for Speech-based Emotion Prediction [Internet]. IEEE SigPort; 2017. Available from : http://sigport.org/1775

DNN APPROACH TO SPEAKER DIARISATION USING SPEAKER CHANNELS

Paper Details

Authors:
Rosanna Milner, Thomas Hain
Submitted On:
11 March 2017 - 8:47pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

talk-dia-icassp17-milner.pdf

(266)

Keywords

Additional Categories

Subscribe

[1] Rosanna Milner, Thomas Hain, "DNN APPROACH TO SPEAKER DIARISATION USING SPEAKER CHANNELS", IEEE SigPort, 2017. [Online]. Available: http://sigport.org/1678. Accessed: Jul. 17, 2019.
@article{1678-17,
url = {http://sigport.org/1678},
author = {Rosanna Milner; Thomas Hain },
publisher = {IEEE SigPort},
title = {DNN APPROACH TO SPEAKER DIARISATION USING SPEAKER CHANNELS},
year = {2017} }
TY - EJOUR
T1 - DNN APPROACH TO SPEAKER DIARISATION USING SPEAKER CHANNELS
AU - Rosanna Milner; Thomas Hain
PY - 2017
PB - IEEE SigPort
UR - http://sigport.org/1678
ER -
Rosanna Milner, Thomas Hain. (2017). DNN APPROACH TO SPEAKER DIARISATION USING SPEAKER CHANNELS. IEEE SigPort. http://sigport.org/1678
Rosanna Milner, Thomas Hain, 2017. DNN APPROACH TO SPEAKER DIARISATION USING SPEAKER CHANNELS. Available at: http://sigport.org/1678.
Rosanna Milner, Thomas Hain. (2017). "DNN APPROACH TO SPEAKER DIARISATION USING SPEAKER CHANNELS." Web.
1. Rosanna Milner, Thomas Hain. DNN APPROACH TO SPEAKER DIARISATION USING SPEAKER CHANNELS [Internet]. IEEE SigPort; 2017. Available from : http://sigport.org/1678

Automatic Speech Emotion Recognition Using Recurrent Neural Networks with Local Attention


Automatic emotion recognition from speech is a challenging task which relies heavily on the effectiveness of the speech features used for classification. In this work, we study the use of deep learning to automatically discover emotionally relevant features from speech. It is shown that using a deep recurrent neural network, we can learn both the short-time frame-level acoustic features that are emotionally relevant, as well as an appropriate temporal aggregation of those features into a compact utterance-level representation.

Paper Details

Authors:
Emad Barsoum, Cha Zhang
Submitted On:
15 March 2017 - 12:33am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

icassp2017.pptx

(292)

icassp2017.pdf

(543)

Subscribe

[1] Emad Barsoum, Cha Zhang, "Automatic Speech Emotion Recognition Using Recurrent Neural Networks with Local Attention", IEEE SigPort, 2017. [Online]. Available: http://sigport.org/1667. Accessed: Jul. 17, 2019.
@article{1667-17,
url = {http://sigport.org/1667},
author = {Emad Barsoum; Cha Zhang },
publisher = {IEEE SigPort},
title = {Automatic Speech Emotion Recognition Using Recurrent Neural Networks with Local Attention},
year = {2017} }
TY - EJOUR
T1 - Automatic Speech Emotion Recognition Using Recurrent Neural Networks with Local Attention
AU - Emad Barsoum; Cha Zhang
PY - 2017
PB - IEEE SigPort
UR - http://sigport.org/1667
ER -
Emad Barsoum, Cha Zhang. (2017). Automatic Speech Emotion Recognition Using Recurrent Neural Networks with Local Attention. IEEE SigPort. http://sigport.org/1667
Emad Barsoum, Cha Zhang, 2017. Automatic Speech Emotion Recognition Using Recurrent Neural Networks with Local Attention. Available at: http://sigport.org/1667.
Emad Barsoum, Cha Zhang. (2017). "Automatic Speech Emotion Recognition Using Recurrent Neural Networks with Local Attention." Web.
1. Emad Barsoum, Cha Zhang. Automatic Speech Emotion Recognition Using Recurrent Neural Networks with Local Attention [Internet]. IEEE SigPort; 2017. Available from : http://sigport.org/1667

ICASSP_FCDNNBSS_poster

Paper Details

Authors:
Submitted On:
2 March 2017 - 2:18pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

ICASSP2017_poster.pdf

(288)

Subscribe

[1] , "ICASSP_FCDNNBSS_poster", IEEE SigPort, 2017. [Online]. Available: http://sigport.org/1590. Accessed: Jul. 17, 2019.
@article{1590-17,
url = {http://sigport.org/1590},
author = { },
publisher = {IEEE SigPort},
title = {ICASSP_FCDNNBSS_poster},
year = {2017} }
TY - EJOUR
T1 - ICASSP_FCDNNBSS_poster
AU -
PY - 2017
PB - IEEE SigPort
UR - http://sigport.org/1590
ER -
. (2017). ICASSP_FCDNNBSS_poster. IEEE SigPort. http://sigport.org/1590
, 2017. ICASSP_FCDNNBSS_poster. Available at: http://sigport.org/1590.
. (2017). "ICASSP_FCDNNBSS_poster." Web.
1. . ICASSP_FCDNNBSS_poster [Internet]. IEEE SigPort; 2017. Available from : http://sigport.org/1590

An Initial Study of Indonesian Semantic Role Labeling and Its Application on Event Extraction


Semantic role labeling (SRL) is a task to as- sign semantic role labels to sentence elements. This pa- per describes the initial development of an Indonesian semantic role labeling system and its application to extract event information from Tweets. We compare two feature types when designing the SRL systems: Word-to-Word and Phrase-to-Phrase. Our experiments showed that the Word- to-Word feature approach outperforms the Phrase-to-Phrase approach. The application of the SRL system to an event extraction problem resulted overlap-based accuracy of 0.94 for the actor identification.

Paper Details

Authors:
Ayu Purwarianti, Lisa Madlberger
Submitted On:
21 November 2016 - 10:37pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

presentation_IALP2016_Ade.pdf

(394)

Keywords

Additional Categories

Subscribe

[1] Ayu Purwarianti, Lisa Madlberger, "An Initial Study of Indonesian Semantic Role Labeling and Its Application on Event Extraction ", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1289. Accessed: Jul. 17, 2019.
@article{1289-16,
url = {http://sigport.org/1289},
author = {Ayu Purwarianti; Lisa Madlberger },
publisher = {IEEE SigPort},
title = {An Initial Study of Indonesian Semantic Role Labeling and Its Application on Event Extraction },
year = {2016} }
TY - EJOUR
T1 - An Initial Study of Indonesian Semantic Role Labeling and Its Application on Event Extraction
AU - Ayu Purwarianti; Lisa Madlberger
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1289
ER -
Ayu Purwarianti, Lisa Madlberger. (2016). An Initial Study of Indonesian Semantic Role Labeling and Its Application on Event Extraction . IEEE SigPort. http://sigport.org/1289
Ayu Purwarianti, Lisa Madlberger, 2016. An Initial Study of Indonesian Semantic Role Labeling and Its Application on Event Extraction . Available at: http://sigport.org/1289.
Ayu Purwarianti, Lisa Madlberger. (2016). "An Initial Study of Indonesian Semantic Role Labeling and Its Application on Event Extraction ." Web.
1. Ayu Purwarianti, Lisa Madlberger. An Initial Study of Indonesian Semantic Role Labeling and Its Application on Event Extraction [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1289

An Investigation of Adaptation Techniques for Building Acoustic Models for Hearing-impaired Children in a CAPT Application

Paper Details

Authors:
Yingke Zhu, Brian Mak
Submitted On:
20 October 2016 - 12:55am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

presentation slides

(407)

Keywords

Additional Categories

Subscribe

[1] Yingke Zhu, Brian Mak, "An Investigation of Adaptation Techniques for Building Acoustic Models for Hearing-impaired Children in a CAPT Application", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1260. Accessed: Jul. 17, 2019.
@article{1260-16,
url = {http://sigport.org/1260},
author = {Yingke Zhu; Brian Mak },
publisher = {IEEE SigPort},
title = {An Investigation of Adaptation Techniques for Building Acoustic Models for Hearing-impaired Children in a CAPT Application},
year = {2016} }
TY - EJOUR
T1 - An Investigation of Adaptation Techniques for Building Acoustic Models for Hearing-impaired Children in a CAPT Application
AU - Yingke Zhu; Brian Mak
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1260
ER -
Yingke Zhu, Brian Mak. (2016). An Investigation of Adaptation Techniques for Building Acoustic Models for Hearing-impaired Children in a CAPT Application. IEEE SigPort. http://sigport.org/1260
Yingke Zhu, Brian Mak, 2016. An Investigation of Adaptation Techniques for Building Acoustic Models for Hearing-impaired Children in a CAPT Application. Available at: http://sigport.org/1260.
Yingke Zhu, Brian Mak. (2016). "An Investigation of Adaptation Techniques for Building Acoustic Models for Hearing-impaired Children in a CAPT Application." Web.
1. Yingke Zhu, Brian Mak. An Investigation of Adaptation Techniques for Building Acoustic Models for Hearing-impaired Children in a CAPT Application [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1260

Pages