Sorry, you need to enable JavaScript to visit this website.

Towards Transferable Speech Emotion Representation: On Loss Functions For Cross-Lingual Latent Representations

Error message

  • The specified file temporary://fileDQ0aSY could not be copied, because the destination directory is not properly configured. This may be caused by a problem with file or directory permissions. More information is available in the system log.
  • The specified file temporary://fileyqWPQw could not be copied, because the destination directory is not properly configured. This may be caused by a problem with file or directory permissions. More information is available in the system log.
  • The specified file temporary://file5iTN0P could not be copied, because the destination directory is not properly configured. This may be caused by a problem with file or directory permissions. More information is available in the system log.
  • The specified file temporary://fileG2i1dQ could not be copied, because the destination directory is not properly configured. This may be caused by a problem with file or directory permissions. More information is available in the system log.
  • The specified file temporary://fileHWbGP4 could not be copied, because the destination directory is not properly configured. This may be caused by a problem with file or directory permissions. More information is available in the system log.
  • The specified file temporary://fileyXVbwv could not be copied, because the destination directory is not properly configured. This may be caused by a problem with file or directory permissions. More information is available in the system log.
  • The specified file temporary://fileUoPZME could not be copied, because the destination directory is not properly configured. This may be caused by a problem with file or directory permissions. More information is available in the system log.
  • The specified file temporary://fileumLDFL could not be copied, because the destination directory is not properly configured. This may be caused by a problem with file or directory permissions. More information is available in the system log.
  • The specified file temporary://fileDPo7Hj could not be copied, because the destination directory is not properly configured. This may be caused by a problem with file or directory permissions. More information is available in the system log.
Citation Author(s):
Sneha Das, Nicole Nadine Lønfeldt, Anne Katrine Pagsberg, Line H. Clemmensen
Submitted by:
Sneha Das
Last updated:
14 May 2022 - 4:06am
Document Type:
Presentation Slides
Document Year:
2022
Event:
Presenters:
Sneha Das
Paper Code:
SPE-15.6
Categories:
 

In recent years, speech emotion recognition (SER) has been used in wide ranging applications, from healthcare to the commercial sector. In addition to signal processing approaches, methods for SER now also use deep learning techniques which provide transfer learning possibilities. However, generalizing over languages, corpora and recording conditions is still an open challenge. In this work we address this gap by exploring loss functions that aid in transferability, specifically to non-tonal languages. We propose a variational autoencoder (VAE) with KL annealing and a semi-supervised VAE to obtain more consistent latent embedding distributions across data sets. To ensure transferability, the distribution of the latent embedding should be similar across non-tonal languages (data sets). We start by presenting a low-complexity SER based on a denoising-autoencoder, which achieves an unweighted classification accuracy of over 52.09% for four-class emotion classification. This performance is comparable to that of similar baseline methods. Following this, we employ a VAE, the semi-supervised VAE and the VAE with KL annealing to obtain a more regularized latent space. We show that while the DAE has the highest classification accuracy among the methods, the semi-supervised VAE has a comparable classification accuracy and a more consistent latent embedding distribution over data sets.

up
0 users have voted: