Sorry, you need to enable JavaScript to visit this website.

A PLLR and Multi-stage Staircase Regression Framework for Speech-based Emotion Prediction

Citation Author(s):
Zhaocheng Huang, Julien Epps
Submitted by:
Zhaocheng Huang
Last updated:
17 March 2017 - 10:17pm
Document Type:
Poster
Document Year:
2017
Event:
Presenters:
Zhaocheng Huang
Paper Code:
SP-P2.7
Categories:
 

Continuous prediction of dimensional emotions (e.g. arousal and valence) has attracted increasing research interest recently. When processing emotional speech signals, phonetic features have been rarely used due to the assumption that phonetic variability is a confounding factor that degrades emotion recognition/prediction performance. In this paper, instead of eliminating phonetic variability, we investigated whether Phone Log-Likelihood Ratio (PLLR) features could be used to index arousal and valence in a pairwise low/high framework. A multi-stage staircase regression (SR) framework which enables fusion at three different stages is also investigated. Results on the RECOLA database show that PLLR outperforms EGEMAPS features for arousal and valence. Interestingly, long-term averaged PLLR proved to be more robust and emotionally informative than local frame-level PLLR, which contains more phoneme-specific information. Within the multi-stage SR framework, PLLR yielded an 8.2% and 11.6% relative improvement in CCC for arousal and valence respectively, showing great promise for including phonetic features in emotion prediction systems.

up
0 users have voted: