A PLLR and Multi-stage Staircase Regression Framework for Speech-based Emotion Prediction

Continuous prediction of dimensional emotions (e.g. arousal and valence) has attracted increasing research interest recently. When processing emotional speech signals, phonetic features have been rarely used due to the assumption that phonetic variability is a confounding factor that degrades emotion recognition/prediction performance. In this paper, instead of eliminating phonetic variability, we investigated whether Phone Log-Likelihood Ratio (PLLR) features could be used to index arousal and valence in a pairwise low/high framework. A multi-stage staircase regression (SR) framework which enables fusion at three different stages is also investigated. Results on the RECOLA database show that PLLR outperforms EGEMAPS features for arousal and valence. Interestingly, long-term averaged PLLR proved to be more robust and emotionally informative than local frame-level PLLR, which contains more phoneme-specific information. Within the multi-stage SR framework, PLLR yielded an 8.2% and 11.6% relative improvement in CCC for arousal and valence respectively, showing great promise for including phonetic features in emotion prediction systems.

DAVID_ICASSP2017_V1.pdf

DAVID_ICASSP2017_V1.pdf (779)

Thumbs Up

CITE

Documents

Poster

A PLLR and Multi-stage Staircase Regression Framework for Speech-based Emotion Prediction

DAVID_ICASSP2017_V1.pdf

QUESTIONS?