Sorry, you need to enable JavaScript to visit this website.

Speech Emotion Recognition

MULTI-HEAD ATTENTION FOR SPEECH EMOTION RECOGNITION WITH AUXILIARY LEARNING OF GENDER RECOGNITION


The paper presents a Multi-Head Attention deep learning network for Speech Emotion Recognition (SER) using Log mel-Filter Bank Energies (LFBE) spectral features as the input. The multi-head attention along with the position embedding jointly attends to information from different representations of the same LFBE input sequence. The position embedding helps in attending to the dominant emotion features by identifying positions of the features in the sequence. In addition to Multi-Head Attention and position embedding, we apply multi-task learning with gender recognition as an auxiliary task.

Paper Details

Authors:
Periyasamy Paramasivam, Promod Yenigalla
Submitted On:
21 May 2020 - 11:36pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

ICASSP.pdf

(58)

Keywords

Additional Categories

Subscribe

[1] Periyasamy Paramasivam, Promod Yenigalla, "MULTI-HEAD ATTENTION FOR SPEECH EMOTION RECOGNITION WITH AUXILIARY LEARNING OF GENDER RECOGNITION", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5430. Accessed: Oct. 01, 2020.
@article{5430-20,
url = {http://sigport.org/5430},
author = {Periyasamy Paramasivam; Promod Yenigalla },
publisher = {IEEE SigPort},
title = {MULTI-HEAD ATTENTION FOR SPEECH EMOTION RECOGNITION WITH AUXILIARY LEARNING OF GENDER RECOGNITION},
year = {2020} }
TY - EJOUR
T1 - MULTI-HEAD ATTENTION FOR SPEECH EMOTION RECOGNITION WITH AUXILIARY LEARNING OF GENDER RECOGNITION
AU - Periyasamy Paramasivam; Promod Yenigalla
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5430
ER -
Periyasamy Paramasivam, Promod Yenigalla. (2020). MULTI-HEAD ATTENTION FOR SPEECH EMOTION RECOGNITION WITH AUXILIARY LEARNING OF GENDER RECOGNITION. IEEE SigPort. http://sigport.org/5430
Periyasamy Paramasivam, Promod Yenigalla, 2020. MULTI-HEAD ATTENTION FOR SPEECH EMOTION RECOGNITION WITH AUXILIARY LEARNING OF GENDER RECOGNITION. Available at: http://sigport.org/5430.
Periyasamy Paramasivam, Promod Yenigalla. (2020). "MULTI-HEAD ATTENTION FOR SPEECH EMOTION RECOGNITION WITH AUXILIARY LEARNING OF GENDER RECOGNITION." Web.
1. Periyasamy Paramasivam, Promod Yenigalla. MULTI-HEAD ATTENTION FOR SPEECH EMOTION RECOGNITION WITH AUXILIARY LEARNING OF GENDER RECOGNITION [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5430

DEEP ENCODED LINGUISTIC AND ACOUSTIC CUES FOR ATTENTION BASED END TO END SPEECH EMOTION RECOGNITION

Paper Details

Authors:
Swapnil Bhosale, Rupayan Chakraborty, Sunil Kumar Kopparapu
Submitted On:
20 May 2020 - 5:22am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

ICASSP_presentation_5598.pdf

(68)

Keywords

Additional Categories

Subscribe

[1] Swapnil Bhosale, Rupayan Chakraborty, Sunil Kumar Kopparapu, "DEEP ENCODED LINGUISTIC AND ACOUSTIC CUES FOR ATTENTION BASED END TO END SPEECH EMOTION RECOGNITION", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5404. Accessed: Oct. 01, 2020.
@article{5404-20,
url = {http://sigport.org/5404},
author = {Swapnil Bhosale; Rupayan Chakraborty; Sunil Kumar Kopparapu },
publisher = {IEEE SigPort},
title = {DEEP ENCODED LINGUISTIC AND ACOUSTIC CUES FOR ATTENTION BASED END TO END SPEECH EMOTION RECOGNITION},
year = {2020} }
TY - EJOUR
T1 - DEEP ENCODED LINGUISTIC AND ACOUSTIC CUES FOR ATTENTION BASED END TO END SPEECH EMOTION RECOGNITION
AU - Swapnil Bhosale; Rupayan Chakraborty; Sunil Kumar Kopparapu
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5404
ER -
Swapnil Bhosale, Rupayan Chakraborty, Sunil Kumar Kopparapu. (2020). DEEP ENCODED LINGUISTIC AND ACOUSTIC CUES FOR ATTENTION BASED END TO END SPEECH EMOTION RECOGNITION. IEEE SigPort. http://sigport.org/5404
Swapnil Bhosale, Rupayan Chakraborty, Sunil Kumar Kopparapu, 2020. DEEP ENCODED LINGUISTIC AND ACOUSTIC CUES FOR ATTENTION BASED END TO END SPEECH EMOTION RECOGNITION. Available at: http://sigport.org/5404.
Swapnil Bhosale, Rupayan Chakraborty, Sunil Kumar Kopparapu. (2020). "DEEP ENCODED LINGUISTIC AND ACOUSTIC CUES FOR ATTENTION BASED END TO END SPEECH EMOTION RECOGNITION." Web.
1. Swapnil Bhosale, Rupayan Chakraborty, Sunil Kumar Kopparapu. DEEP ENCODED LINGUISTIC AND ACOUSTIC CUES FOR ATTENTION BASED END TO END SPEECH EMOTION RECOGNITION [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5404

Multi-Conditioning & Data Augmentation using Generative Noise Model for Speech Emotion Recognition in Noisy Conditions


Degradation due to additive noise is a significant road block in the real-life deployment of Speech Emotion Recognition (SER) systems. Most of the previous work in this field dealt with the noise degradation either at the signal or at the feature level. In this paper, to address the robustness aspect of the SER in additive noise scenarios, we propose multi-conditioning and data augmentation using an utterance level parametric generative noise model. The generative noise model is designed to generate noise types which can span the entire noise space in the mel-filterbank energy domain.

Paper Details

Authors:
Upasana Tiwari, Meet Soni, Rupayan Chakraborty, Ashish Panda, Sunil Kumar Kopparapu
Submitted On:
20 May 2020 - 5:01am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

ICASSP2020_ppt_5701.pdf

(59)

Keywords

Additional Categories

Subscribe

[1] Upasana Tiwari, Meet Soni, Rupayan Chakraborty, Ashish Panda, Sunil Kumar Kopparapu, "Multi-Conditioning & Data Augmentation using Generative Noise Model for Speech Emotion Recognition in Noisy Conditions", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5403. Accessed: Oct. 01, 2020.
@article{5403-20,
url = {http://sigport.org/5403},
author = {Upasana Tiwari; Meet Soni; Rupayan Chakraborty; Ashish Panda; Sunil Kumar Kopparapu },
publisher = {IEEE SigPort},
title = {Multi-Conditioning & Data Augmentation using Generative Noise Model for Speech Emotion Recognition in Noisy Conditions},
year = {2020} }
TY - EJOUR
T1 - Multi-Conditioning & Data Augmentation using Generative Noise Model for Speech Emotion Recognition in Noisy Conditions
AU - Upasana Tiwari; Meet Soni; Rupayan Chakraborty; Ashish Panda; Sunil Kumar Kopparapu
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5403
ER -
Upasana Tiwari, Meet Soni, Rupayan Chakraborty, Ashish Panda, Sunil Kumar Kopparapu. (2020). Multi-Conditioning & Data Augmentation using Generative Noise Model for Speech Emotion Recognition in Noisy Conditions. IEEE SigPort. http://sigport.org/5403
Upasana Tiwari, Meet Soni, Rupayan Chakraborty, Ashish Panda, Sunil Kumar Kopparapu, 2020. Multi-Conditioning & Data Augmentation using Generative Noise Model for Speech Emotion Recognition in Noisy Conditions. Available at: http://sigport.org/5403.
Upasana Tiwari, Meet Soni, Rupayan Chakraborty, Ashish Panda, Sunil Kumar Kopparapu. (2020). "Multi-Conditioning & Data Augmentation using Generative Noise Model for Speech Emotion Recognition in Noisy Conditions." Web.
1. Upasana Tiwari, Meet Soni, Rupayan Chakraborty, Ashish Panda, Sunil Kumar Kopparapu. Multi-Conditioning & Data Augmentation using Generative Noise Model for Speech Emotion Recognition in Noisy Conditions [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5403

UNSUPERVISED LEARNING APPROACH TO FEATURE ANALYSIS FOR AUTOMATIC SPEECH EMOTION RECOGNITION


The scarcity of emotional speech data is a bottleneck of developing automatic speech emotion recognition (ASER) systems. One way to alleviate this issue is to use unsupervised feature learning techniques to learn features from the widely available general speech and use these features to train emotion classifiers. These unsupervised methods, such as denoising autoencoder (DAE), variational autoencoder (VAE), adversarial autoencoder (AAE) and adversarial variational Bayes (AVB), can capture the intrinsic structure of the data distribution in the learned feature representation.

Paper Details

Authors:
Sefik Emre Eskimez, Zhiyao Duan, Wendi Heinzelman
Submitted On:
19 April 2018 - 4:01pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

icassp-2018-poster.pdf

(267)

Keywords

Additional Categories

Subscribe

[1] Sefik Emre Eskimez, Zhiyao Duan, Wendi Heinzelman, "UNSUPERVISED LEARNING APPROACH TO FEATURE ANALYSIS FOR AUTOMATIC SPEECH EMOTION RECOGNITION", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/3017. Accessed: Oct. 01, 2020.
@article{3017-18,
url = {http://sigport.org/3017},
author = {Sefik Emre Eskimez; Zhiyao Duan; Wendi Heinzelman },
publisher = {IEEE SigPort},
title = {UNSUPERVISED LEARNING APPROACH TO FEATURE ANALYSIS FOR AUTOMATIC SPEECH EMOTION RECOGNITION},
year = {2018} }
TY - EJOUR
T1 - UNSUPERVISED LEARNING APPROACH TO FEATURE ANALYSIS FOR AUTOMATIC SPEECH EMOTION RECOGNITION
AU - Sefik Emre Eskimez; Zhiyao Duan; Wendi Heinzelman
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/3017
ER -
Sefik Emre Eskimez, Zhiyao Duan, Wendi Heinzelman. (2018). UNSUPERVISED LEARNING APPROACH TO FEATURE ANALYSIS FOR AUTOMATIC SPEECH EMOTION RECOGNITION. IEEE SigPort. http://sigport.org/3017
Sefik Emre Eskimez, Zhiyao Duan, Wendi Heinzelman, 2018. UNSUPERVISED LEARNING APPROACH TO FEATURE ANALYSIS FOR AUTOMATIC SPEECH EMOTION RECOGNITION. Available at: http://sigport.org/3017.
Sefik Emre Eskimez, Zhiyao Duan, Wendi Heinzelman. (2018). "UNSUPERVISED LEARNING APPROACH TO FEATURE ANALYSIS FOR AUTOMATIC SPEECH EMOTION RECOGNITION." Web.
1. Sefik Emre Eskimez, Zhiyao Duan, Wendi Heinzelman. UNSUPERVISED LEARNING APPROACH TO FEATURE ANALYSIS FOR AUTOMATIC SPEECH EMOTION RECOGNITION [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/3017