Sorry, you need to enable JavaScript to visit this website.

MULTI-LINGUAL MULTI-TASK SPEECH EMOTION RECOGNITION USING WAV2VEC 2.0

Citation Author(s):
Mayank Sharma
Submitted by:
MAYANK SHARMA
Last updated:
7 May 2022 - 4:28am
Document Type:
Poster
Document Year:
2022
Event:
Presenters:
Mayank Sharma
Paper Code:
3413
 

Speech Emotion Recognition (SER) has several use cases for
Digital Entertainment Content (DEC) in Over-the-top (OTT)
services, emotive Text-to-Speech (TTS) engines and voice
assistants. In this work, we present a Multi-Lingual (MLi) and
Multi-Task Learning (MTL) audio only SER system based on
the multi-lingual pre-trained wav2vec 2.0 model. The model
is fine-tuned on 25 open source datasets in 13 locales across
7 emotion categories. We show that, a) Our wav2vec 2.0
single task based model outperforms Pre-trained Audio Neural
Network (PANN) based single task pre-trained model by 7.2%
(relative), b) The best MTL model outperforms the PANN
based and wav2vec 2.0 based single task models by 8.6%
and 1.7% (relative) respectively, c) The MTL based system
outperforms pre-trained single task wav2vec 2.0 model in 9
out of 13 locales in terms of weighted F1 scores, and d) The
MTL-MLi wav2vec 2.0 outperforms the state-of-the-art for
the languages contained in the pre-training corpora.

up
0 users have voted: