Sorry, you need to enable JavaScript to visit this website.

Speech Processing

Speech Landmark Bigrams for Depression Detection from Naturalistic Smartphone Speech


Detection of depression from speech has attracted significant research attention in recent years but remains a challenge, particularly for speech from diverse smartphones in natural environments. This paper proposes two sets of novel features based on speech landmark bigrams associated with abrupt speech articulatory events for depression detection from smartphone audio recordings. Combined with techniques adapted from natural language text processing, the proposed features further exploit landmark bigrams by discovering latent articulatory events.

Paper Details

Authors:
Zhaocheng Huang, Julien Epps, Dale Joachim
Submitted On:
6 June 2019 - 4:42am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

ICASSP2019_Huang_V01_uploaded.pdf

(11)

Subscribe

[1] Zhaocheng Huang, Julien Epps, Dale Joachim, "Speech Landmark Bigrams for Depression Detection from Naturalistic Smartphone Speech", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4565. Accessed: Jun. 24, 2019.
@article{4565-19,
url = {http://sigport.org/4565},
author = {Zhaocheng Huang; Julien Epps; Dale Joachim },
publisher = {IEEE SigPort},
title = {Speech Landmark Bigrams for Depression Detection from Naturalistic Smartphone Speech},
year = {2019} }
TY - EJOUR
T1 - Speech Landmark Bigrams for Depression Detection from Naturalistic Smartphone Speech
AU - Zhaocheng Huang; Julien Epps; Dale Joachim
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4565
ER -
Zhaocheng Huang, Julien Epps, Dale Joachim. (2019). Speech Landmark Bigrams for Depression Detection from Naturalistic Smartphone Speech. IEEE SigPort. http://sigport.org/4565
Zhaocheng Huang, Julien Epps, Dale Joachim, 2019. Speech Landmark Bigrams for Depression Detection from Naturalistic Smartphone Speech. Available at: http://sigport.org/4565.
Zhaocheng Huang, Julien Epps, Dale Joachim. (2019). "Speech Landmark Bigrams for Depression Detection from Naturalistic Smartphone Speech." Web.
1. Zhaocheng Huang, Julien Epps, Dale Joachim. Speech Landmark Bigrams for Depression Detection from Naturalistic Smartphone Speech [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4565

Adversarial Speaker Adaptation


We propose a novel adversarial speaker adaptation (ASA) scheme, in which adversarial learning is applied to regularize the distribution of deep hidden features in a speaker-dependent (SD) deep neural network (DNN) acoustic model to be close to that of a fixed speaker-independent (SI) DNN acoustic model during adaptation. An additional discriminator network is introduced to distinguish the deep features generated by the SD model from those produced by the SI model.

Paper Details

Authors:
Zhong Meng, Jinyu Li, Yifan Gong
Submitted On:
12 May 2019 - 9:26pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

asa_oral_v3.pptx

(15)

Subscribe

[1] Zhong Meng, Jinyu Li, Yifan Gong, "Adversarial Speaker Adaptation", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4475. Accessed: Jun. 24, 2019.
@article{4475-19,
url = {http://sigport.org/4475},
author = {Zhong Meng; Jinyu Li; Yifan Gong },
publisher = {IEEE SigPort},
title = {Adversarial Speaker Adaptation},
year = {2019} }
TY - EJOUR
T1 - Adversarial Speaker Adaptation
AU - Zhong Meng; Jinyu Li; Yifan Gong
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4475
ER -
Zhong Meng, Jinyu Li, Yifan Gong. (2019). Adversarial Speaker Adaptation. IEEE SigPort. http://sigport.org/4475
Zhong Meng, Jinyu Li, Yifan Gong, 2019. Adversarial Speaker Adaptation. Available at: http://sigport.org/4475.
Zhong Meng, Jinyu Li, Yifan Gong. (2019). "Adversarial Speaker Adaptation." Web.
1. Zhong Meng, Jinyu Li, Yifan Gong. Adversarial Speaker Adaptation [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4475

Attentive Adversarial Learning for Domain-Invariant Training


Adversarial domain-invariant training (ADIT) proves to be effective in suppressing the effects of domain variability in acoustic modeling and has led to improved performance in automatic speech recognition (ASR). In ADIT, an auxiliary domain classifier takes in equally-weighted deep features from a deep neural network (DNN) acoustic model and is trained to improve their domain-invariance by optimizing an adversarial loss function.

Paper Details

Authors:
Zhong Meng, Jinyu Li, Yifan Gong
Submitted On:
12 May 2019 - 9:03pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

aadit_poster.pptx

(12)

Subscribe

[1] Zhong Meng, Jinyu Li, Yifan Gong, "Attentive Adversarial Learning for Domain-Invariant Training", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4474. Accessed: Jun. 24, 2019.
@article{4474-19,
url = {http://sigport.org/4474},
author = {Zhong Meng; Jinyu Li; Yifan Gong },
publisher = {IEEE SigPort},
title = {Attentive Adversarial Learning for Domain-Invariant Training},
year = {2019} }
TY - EJOUR
T1 - Attentive Adversarial Learning for Domain-Invariant Training
AU - Zhong Meng; Jinyu Li; Yifan Gong
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4474
ER -
Zhong Meng, Jinyu Li, Yifan Gong. (2019). Attentive Adversarial Learning for Domain-Invariant Training. IEEE SigPort. http://sigport.org/4474
Zhong Meng, Jinyu Li, Yifan Gong, 2019. Attentive Adversarial Learning for Domain-Invariant Training. Available at: http://sigport.org/4474.
Zhong Meng, Jinyu Li, Yifan Gong. (2019). "Attentive Adversarial Learning for Domain-Invariant Training." Web.
1. Zhong Meng, Jinyu Li, Yifan Gong. Attentive Adversarial Learning for Domain-Invariant Training [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4474

Adversarial Speaker Verification


The use of deep networks to extract embeddings for speaker recognition has proven successfully. However, such embeddings are susceptible to performance degradation due to the mismatches among the training, enrollment, and test conditions. In this work, we propose an adversarial speaker verification (ASV) scheme to learn the condition-invariant deep embedding via adversarial multi-task training. In ASV, a speaker classification network and a condition identification network are jointly optimized to minimize the speaker classification loss and simultaneously mini-maximize the condition loss.

Paper Details

Authors:
Zhong Meng, Yong Zhao, Jinyu Li, Yifan Gong
Submitted On:
12 May 2019 - 9:24pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

asv_poster_v3.pptx

(21)

Subscribe

[1] Zhong Meng, Yong Zhao, Jinyu Li, Yifan Gong, "Adversarial Speaker Verification", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4473. Accessed: Jun. 24, 2019.
@article{4473-19,
url = {http://sigport.org/4473},
author = {Zhong Meng; Yong Zhao; Jinyu Li; Yifan Gong },
publisher = {IEEE SigPort},
title = {Adversarial Speaker Verification},
year = {2019} }
TY - EJOUR
T1 - Adversarial Speaker Verification
AU - Zhong Meng; Yong Zhao; Jinyu Li; Yifan Gong
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4473
ER -
Zhong Meng, Yong Zhao, Jinyu Li, Yifan Gong. (2019). Adversarial Speaker Verification. IEEE SigPort. http://sigport.org/4473
Zhong Meng, Yong Zhao, Jinyu Li, Yifan Gong, 2019. Adversarial Speaker Verification. Available at: http://sigport.org/4473.
Zhong Meng, Yong Zhao, Jinyu Li, Yifan Gong. (2019). "Adversarial Speaker Verification." Web.
1. Zhong Meng, Yong Zhao, Jinyu Li, Yifan Gong. Adversarial Speaker Verification [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4473

Conditional Teacher-Student Learning


The teacher-student (T/S) learning has been shown to be effective for a variety of problems such as domain adaptation and model compression. One shortcoming of the T/S learning is that a teacher model, not always perfect, sporadically produces wrong guidance in form of posterior probabilities that misleads the student model towards a suboptimal performance.

Paper Details

Authors:
Zhong Meng, Jinyu Li, Yong Zhao, Yifan Gong
Submitted On:
12 May 2019 - 9:23pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

cts_poster.pptx

(15)

Subscribe

[1] Zhong Meng, Jinyu Li, Yong Zhao, Yifan Gong, "Conditional Teacher-Student Learning", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4472. Accessed: Jun. 24, 2019.
@article{4472-19,
url = {http://sigport.org/4472},
author = {Zhong Meng; Jinyu Li; Yong Zhao; Yifan Gong },
publisher = {IEEE SigPort},
title = {Conditional Teacher-Student Learning},
year = {2019} }
TY - EJOUR
T1 - Conditional Teacher-Student Learning
AU - Zhong Meng; Jinyu Li; Yong Zhao; Yifan Gong
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4472
ER -
Zhong Meng, Jinyu Li, Yong Zhao, Yifan Gong. (2019). Conditional Teacher-Student Learning. IEEE SigPort. http://sigport.org/4472
Zhong Meng, Jinyu Li, Yong Zhao, Yifan Gong, 2019. Conditional Teacher-Student Learning. Available at: http://sigport.org/4472.
Zhong Meng, Jinyu Li, Yong Zhao, Yifan Gong. (2019). "Conditional Teacher-Student Learning." Web.
1. Zhong Meng, Jinyu Li, Yong Zhao, Yifan Gong. Conditional Teacher-Student Learning [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4472

Role Specific Lattice Rescoring for Speaker Role Recognition from Speech Recognition Outputs


The language patterns followed by different speakers who play specific roles in conversational interactions provide valuable cues for the task of Speaker Role Recognition (SRR). Given the speech signal, existing algorithms typically try to find such patterns in the output of an Automatic Speech Recognition (ASR) system. In this work we propose an alternative way of revealing role-specific linguistic characteristics, by making use of role-specific ASR outputs, which are built by suitably rescoring the lattice produced after a first pass of ASR decoding.

Paper Details

Authors:
David C. Atkins, Shrikanth Narayanan
Submitted On:
9 May 2019 - 3:15pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

RoleSpecificLatticeRescoringICASSP19

(11)

Subscribe

[1] David C. Atkins, Shrikanth Narayanan, "Role Specific Lattice Rescoring for Speaker Role Recognition from Speech Recognition Outputs", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4232. Accessed: Jun. 24, 2019.
@article{4232-19,
url = {http://sigport.org/4232},
author = {David C. Atkins; Shrikanth Narayanan },
publisher = {IEEE SigPort},
title = {Role Specific Lattice Rescoring for Speaker Role Recognition from Speech Recognition Outputs},
year = {2019} }
TY - EJOUR
T1 - Role Specific Lattice Rescoring for Speaker Role Recognition from Speech Recognition Outputs
AU - David C. Atkins; Shrikanth Narayanan
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4232
ER -
David C. Atkins, Shrikanth Narayanan. (2019). Role Specific Lattice Rescoring for Speaker Role Recognition from Speech Recognition Outputs. IEEE SigPort. http://sigport.org/4232
David C. Atkins, Shrikanth Narayanan, 2019. Role Specific Lattice Rescoring for Speaker Role Recognition from Speech Recognition Outputs. Available at: http://sigport.org/4232.
David C. Atkins, Shrikanth Narayanan. (2019). "Role Specific Lattice Rescoring for Speaker Role Recognition from Speech Recognition Outputs." Web.
1. David C. Atkins, Shrikanth Narayanan. Role Specific Lattice Rescoring for Speaker Role Recognition from Speech Recognition Outputs [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4232

Retrieving Speech Samples with Similar Emotional Content Using a Triplet Loss Function


The ability to identify speech with similar emotional content is valuable to many applications, including speech retrieval, surveillance, and emotional speech synthesis. While current formulations in speech emotion recognition based on classification or regression are not appropriate for this task, solutions based on preference learning offer appealing approaches for this task. This paper aims to find speech samples that are emotionally similar to an anchor speech sample provided as a query. This novel formulation opens interesting research questions.

Paper Details

Authors:
Reza Lotfian, Carlos Busso
Submitted On:
9 May 2019 - 12:32pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

poster_draft_final.pdf

(14)

Keywords

Additional Categories

Subscribe

[1] Reza Lotfian, Carlos Busso, "Retrieving Speech Samples with Similar Emotional Content Using a Triplet Loss Function", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4222. Accessed: Jun. 24, 2019.
@article{4222-19,
url = {http://sigport.org/4222},
author = {Reza Lotfian; Carlos Busso },
publisher = {IEEE SigPort},
title = {Retrieving Speech Samples with Similar Emotional Content Using a Triplet Loss Function},
year = {2019} }
TY - EJOUR
T1 - Retrieving Speech Samples with Similar Emotional Content Using a Triplet Loss Function
AU - Reza Lotfian; Carlos Busso
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4222
ER -
Reza Lotfian, Carlos Busso. (2019). Retrieving Speech Samples with Similar Emotional Content Using a Triplet Loss Function. IEEE SigPort. http://sigport.org/4222
Reza Lotfian, Carlos Busso, 2019. Retrieving Speech Samples with Similar Emotional Content Using a Triplet Loss Function. Available at: http://sigport.org/4222.
Reza Lotfian, Carlos Busso. (2019). "Retrieving Speech Samples with Similar Emotional Content Using a Triplet Loss Function." Web.
1. Reza Lotfian, Carlos Busso. Retrieving Speech Samples with Similar Emotional Content Using a Triplet Loss Function [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4222

SPEAKER AGNOSTIC FOREGROUND SPEECH DETECTION FROM AUDIO RECORDINGS 
IN WORKPLACE SETTINGS FROM WEARABLE RECORDERS



Audio-signal acquisition as part of wearable sensing adds an important dimension for applications such as understanding human behaviors. As part of a large study on work place behaviors, we collected audio data from individual hospital staff using custom wearable recorders. The audio features collected were limited to preserve privacy of the interactions in the hospital. A first step towards audio processing is to identify the foreground speech of the person wearing the audio badge.

Paper Details

Authors:
Amrutha Nadarajan, Krishna Somandepalli, Shrikanth S. Narayanan
Submitted On:
9 May 2019 - 12:29am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

SPEAKER AGNOSTIC FOREGROUND SPEECH DETECTION FROM AUDIO RECORDINGS 
IN WORKPLACE SETTINGS FROM WEARABLE RECORDERS


(15)

Subscribe

[1] Amrutha Nadarajan, Krishna Somandepalli, Shrikanth S. Narayanan, "SPEAKER AGNOSTIC FOREGROUND SPEECH DETECTION FROM AUDIO RECORDINGS 
IN WORKPLACE SETTINGS FROM WEARABLE RECORDERS
", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4146. Accessed: Jun. 24, 2019.
@article{4146-19,
url = {http://sigport.org/4146},
author = {Amrutha Nadarajan; Krishna Somandepalli; Shrikanth S. Narayanan },
publisher = {IEEE SigPort},
title = {SPEAKER AGNOSTIC FOREGROUND SPEECH DETECTION FROM AUDIO RECORDINGS 
IN WORKPLACE SETTINGS FROM WEARABLE RECORDERS
},
year = {2019} }
TY - EJOUR
T1 - SPEAKER AGNOSTIC FOREGROUND SPEECH DETECTION FROM AUDIO RECORDINGS 
IN WORKPLACE SETTINGS FROM WEARABLE RECORDERS

AU - Amrutha Nadarajan; Krishna Somandepalli; Shrikanth S. Narayanan
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4146
ER -
Amrutha Nadarajan, Krishna Somandepalli, Shrikanth S. Narayanan. (2019). SPEAKER AGNOSTIC FOREGROUND SPEECH DETECTION FROM AUDIO RECORDINGS 
IN WORKPLACE SETTINGS FROM WEARABLE RECORDERS
. IEEE SigPort. http://sigport.org/4146
Amrutha Nadarajan, Krishna Somandepalli, Shrikanth S. Narayanan, 2019. SPEAKER AGNOSTIC FOREGROUND SPEECH DETECTION FROM AUDIO RECORDINGS 
IN WORKPLACE SETTINGS FROM WEARABLE RECORDERS
. Available at: http://sigport.org/4146.
Amrutha Nadarajan, Krishna Somandepalli, Shrikanth S. Narayanan. (2019). "SPEAKER AGNOSTIC FOREGROUND SPEECH DETECTION FROM AUDIO RECORDINGS 
IN WORKPLACE SETTINGS FROM WEARABLE RECORDERS
." Web.
1. Amrutha Nadarajan, Krishna Somandepalli, Shrikanth S. Narayanan. SPEAKER AGNOSTIC FOREGROUND SPEECH DETECTION FROM AUDIO RECORDINGS 
IN WORKPLACE SETTINGS FROM WEARABLE RECORDERS
 [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4146

MULTI-BAND PIT AND MODEL INTEGRATION FOR IMPROVED MULTI-CHANNEL SPEECH SEPARATION

Paper Details

Authors:
Submitted On:
8 May 2019 - 10:14pm
Short Link:
Type:
Event:
Document Year:
Cite

Document Files

Poster_for_multiband_PIT.pdf

(17)

Subscribe

[1] , "MULTI-BAND PIT AND MODEL INTEGRATION FOR IMPROVED MULTI-CHANNEL SPEECH SEPARATION", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4141. Accessed: Jun. 24, 2019.
@article{4141-19,
url = {http://sigport.org/4141},
author = { },
publisher = {IEEE SigPort},
title = {MULTI-BAND PIT AND MODEL INTEGRATION FOR IMPROVED MULTI-CHANNEL SPEECH SEPARATION},
year = {2019} }
TY - EJOUR
T1 - MULTI-BAND PIT AND MODEL INTEGRATION FOR IMPROVED MULTI-CHANNEL SPEECH SEPARATION
AU -
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4141
ER -
. (2019). MULTI-BAND PIT AND MODEL INTEGRATION FOR IMPROVED MULTI-CHANNEL SPEECH SEPARATION. IEEE SigPort. http://sigport.org/4141
, 2019. MULTI-BAND PIT AND MODEL INTEGRATION FOR IMPROVED MULTI-CHANNEL SPEECH SEPARATION. Available at: http://sigport.org/4141.
. (2019). "MULTI-BAND PIT AND MODEL INTEGRATION FOR IMPROVED MULTI-CHANNEL SPEECH SEPARATION." Web.
1. . MULTI-BAND PIT AND MODEL INTEGRATION FOR IMPROVED MULTI-CHANNEL SPEECH SEPARATION [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4141

ICASSP 2019 Poster (TRANSMISSION LINE COCHLEAR MODEL BASED AM-FM FEATURES FOR REPLAY ATTACK DETECTION)

Paper Details

Authors:
Tharshini Gunendradasan, Saad Irtza, Eliathamby Ambikairajah, Julien Epps
Submitted On:
7 May 2019 - 6:58pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

ICASSP2019_Poster_TharshiniGunendradasan.pdf

(19)

Subscribe

[1] Tharshini Gunendradasan, Saad Irtza, Eliathamby Ambikairajah, Julien Epps, "ICASSP 2019 Poster (TRANSMISSION LINE COCHLEAR MODEL BASED AM-FM FEATURES FOR REPLAY ATTACK DETECTION)", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/3965. Accessed: Jun. 24, 2019.
@article{3965-19,
url = {http://sigport.org/3965},
author = {Tharshini Gunendradasan; Saad Irtza; Eliathamby Ambikairajah; Julien Epps },
publisher = {IEEE SigPort},
title = {ICASSP 2019 Poster (TRANSMISSION LINE COCHLEAR MODEL BASED AM-FM FEATURES FOR REPLAY ATTACK DETECTION)},
year = {2019} }
TY - EJOUR
T1 - ICASSP 2019 Poster (TRANSMISSION LINE COCHLEAR MODEL BASED AM-FM FEATURES FOR REPLAY ATTACK DETECTION)
AU - Tharshini Gunendradasan; Saad Irtza; Eliathamby Ambikairajah; Julien Epps
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/3965
ER -
Tharshini Gunendradasan, Saad Irtza, Eliathamby Ambikairajah, Julien Epps. (2019). ICASSP 2019 Poster (TRANSMISSION LINE COCHLEAR MODEL BASED AM-FM FEATURES FOR REPLAY ATTACK DETECTION). IEEE SigPort. http://sigport.org/3965
Tharshini Gunendradasan, Saad Irtza, Eliathamby Ambikairajah, Julien Epps, 2019. ICASSP 2019 Poster (TRANSMISSION LINE COCHLEAR MODEL BASED AM-FM FEATURES FOR REPLAY ATTACK DETECTION). Available at: http://sigport.org/3965.
Tharshini Gunendradasan, Saad Irtza, Eliathamby Ambikairajah, Julien Epps. (2019). "ICASSP 2019 Poster (TRANSMISSION LINE COCHLEAR MODEL BASED AM-FM FEATURES FOR REPLAY ATTACK DETECTION)." Web.
1. Tharshini Gunendradasan, Saad Irtza, Eliathamby Ambikairajah, Julien Epps. ICASSP 2019 Poster (TRANSMISSION LINE COCHLEAR MODEL BASED AM-FM FEATURES FOR REPLAY ATTACK DETECTION) [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/3965

Pages