Sorry, you need to enable JavaScript to visit this website.

facebooktwittermailshare

MULTI-HEAD ATTENTION FOR SPEECH EMOTION RECOGNITION WITH AUXILIARY LEARNING OF GENDER RECOGNITION

Abstract: 

The paper presents a Multi-Head Attention deep learning network for Speech Emotion Recognition (SER) using Log mel-Filter Bank Energies (LFBE) spectral features as the input. The multi-head attention along with the position embedding jointly attends to information from different representations of the same LFBE input sequence. The position embedding helps in attending to the dominant emotion features by identifying positions of the features in the sequence. In addition to Multi-Head Attention and position embedding, we apply multi-task learning with gender recognition as an auxiliary task. The auxiliary task helps in learning the gender specific features that influence the emotion characteristics in speech and results in improved accuracy of Speech Emotion Recognition, the primary task. We conducted all our experiments on IEMOCAP dataset. We are able to achieve an overall accuracy of 76.4% and average class accuracy of 70.1%, which are 5.3% and 6.2% higher respectively than the state-of-the-art models available on SER for four emotion classes.

up
0 users have voted:

Paper Details

Authors:
Periyasamy Paramasivam, Promod Yenigalla
Submitted On:
21 May 2020 - 11:36pm
Short Link:
Type:
Presentation Slides
Event:
Presenter's Name:
Anish Nediyanchath
Paper Code:
SPE-P11.8
Document Year:
2020
Cite

Document Files

ICASSP.pdf

(6)

Keywords

Additional Categories

Subscribe

[1] Periyasamy Paramasivam, Promod Yenigalla, "MULTI-HEAD ATTENTION FOR SPEECH EMOTION RECOGNITION WITH AUXILIARY LEARNING OF GENDER RECOGNITION", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5430. Accessed: Jun. 06, 2020.
@article{5430-20,
url = {http://sigport.org/5430},
author = {Periyasamy Paramasivam; Promod Yenigalla },
publisher = {IEEE SigPort},
title = {MULTI-HEAD ATTENTION FOR SPEECH EMOTION RECOGNITION WITH AUXILIARY LEARNING OF GENDER RECOGNITION},
year = {2020} }
TY - EJOUR
T1 - MULTI-HEAD ATTENTION FOR SPEECH EMOTION RECOGNITION WITH AUXILIARY LEARNING OF GENDER RECOGNITION
AU - Periyasamy Paramasivam; Promod Yenigalla
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5430
ER -
Periyasamy Paramasivam, Promod Yenigalla. (2020). MULTI-HEAD ATTENTION FOR SPEECH EMOTION RECOGNITION WITH AUXILIARY LEARNING OF GENDER RECOGNITION. IEEE SigPort. http://sigport.org/5430
Periyasamy Paramasivam, Promod Yenigalla, 2020. MULTI-HEAD ATTENTION FOR SPEECH EMOTION RECOGNITION WITH AUXILIARY LEARNING OF GENDER RECOGNITION. Available at: http://sigport.org/5430.
Periyasamy Paramasivam, Promod Yenigalla. (2020). "MULTI-HEAD ATTENTION FOR SPEECH EMOTION RECOGNITION WITH AUXILIARY LEARNING OF GENDER RECOGNITION." Web.
1. Periyasamy Paramasivam, Promod Yenigalla. MULTI-HEAD ATTENTION FOR SPEECH EMOTION RECOGNITION WITH AUXILIARY LEARNING OF GENDER RECOGNITION [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5430