Sorry, you need to enable JavaScript to visit this website.

PROMPTING AUDIOS USING ACOUSTIC PROPERTIES FOR EMOTION REPRESENTATION

DOI:
10.60864/5z4h-b050
Citation Author(s):
Soham Deshmukh, Huaming Wang , Bhiksha Raj, Rita Singh
Submitted by:
Hira Dhamyal
Last updated:
6 June 2024 - 10:50am
Document Type:
Poster
Document Year:
2024
Event:
Presenters:
Hira Dhamyal
Categories:
 

Emotions lie on a continuum, but current models treat emotions
as a finite valued discrete variable. This representation does not
capture the diversity in the expression of emotion. To better rep-
resent emotions we propose the use of natural language descrip-
tions (or prompts). In this work, we address the challenge of au-
tomatically generating these prompts and training a model to better
learn emotion representations from audio and prompt pairs. We use
acoustic properties that are correlated to emotion like pitch, intensity,
speech rate, and articulation rate to automatically generate prompts
i.e. ‘acoustic prompts’. We use a contrastive learning objective to
map speech to their respective acoustic prompts. We evaluate our
model on Emotion Audio Retrieval and Speech Emotion Recogni-
tion. Our results show that the acoustic prompts significantly im-
prove the model’s performance in EAR, in various Precision@K
metrics. In SER, we observe a 3.8% relative accuracy improvement
on the Ravdess dataset.

up
0 users have voted: