Sorry, you need to enable JavaScript to visit this website.

Dynamic Speech Emotion Recognition using a Conditional Neural Process

Error message

  • The specified file temporary://fileKStGUh could not be copied, because the destination directory is not properly configured. This may be caused by a problem with file or directory permissions. More information is available in the system log.
  • The specified file temporary://fileV80um2 could not be copied, because the destination directory is not properly configured. This may be caused by a problem with file or directory permissions. More information is available in the system log.
  • The specified file temporary://filecFPEOh could not be copied, because the destination directory is not properly configured. This may be caused by a problem with file or directory permissions. More information is available in the system log.
  • The specified file temporary://fileOOgZFT could not be copied, because the destination directory is not properly configured. This may be caused by a problem with file or directory permissions. More information is available in the system log.
  • The specified file temporary://fileH7Q1Cq could not be copied, because the destination directory is not properly configured. This may be caused by a problem with file or directory permissions. More information is available in the system log.
  • The specified file temporary://filezxKJIj could not be copied, because the destination directory is not properly configured. This may be caused by a problem with file or directory permissions. More information is available in the system log.
  • The specified file temporary://fileD4QoJt could not be copied, because the destination directory is not properly configured. This may be caused by a problem with file or directory permissions. More information is available in the system log.
  • The specified file temporary://fileKhm2T6 could not be copied, because the destination directory is not properly configured. This may be caused by a problem with file or directory permissions. More information is available in the system log.
DOI:
10.60864/nqtm-kv46
Citation Author(s):
Luz Martinez-Lucas, Carlos Busso
Submitted by:
Luz Martinez-Lucas
Last updated:
6 June 2024 - 10:21am
Document Type:
Poster
Document Year:
2024
Event:
Presenters:
Luz Martinez-Lucas
Paper Code:
SLP-P25.9
 

The problem of predicting emotional attributes from speech has often focused on predicting a single value from a sentence or short speaking turn. These methods often ignore that natural emotions are both dynamic and dependent on context. To model the dynamic nature of emotions, we can treat the prediction of emotion from speech as a time-series problem. We refer to the problem of predicting these emotional traces as dynamic speech emotion recognition. Previous studies in this area have used models that treat all emotional traces as coming from the same underlying distribution. Since emotions are dependent on contextual information, these methods might obscure the context of an emotional interaction. Our paper uses a neural process model with a segment-level speech emotion recognition (SER) model for this problem. This type of model leverages information from the time-series and predictions from the SER model to learn a prior that defines a distribution over emotional traces. Our proposed model performs 21% better than a bidirectional long short-term memory (BiLSTM) baseline when predicting emotional traces for valence. This poster provides an overview of our paper and proposed method.

up
0 users have voted: