Documents
Poster
PROMPTING AUDIOS USING ACOUSTIC PROPERTIES FOR EMOTION REPRESENTATION
- DOI:
- 10.60864/5z4h-b050
- Citation Author(s):
- Submitted by:
- Hira Dhamyal
- Last updated:
- 6 June 2024 - 10:50am
- Document Type:
- Poster
- Document Year:
- 2024
- Event:
- Presenters:
- Hira Dhamyal
- Categories:
- Log in to post comments
Emotions lie on a continuum, but current models treat emotions
as a finite valued discrete variable. This representation does not
capture the diversity in the expression of emotion. To better rep-
resent emotions we propose the use of natural language descrip-
tions (or prompts). In this work, we address the challenge of au-
tomatically generating these prompts and training a model to better
learn emotion representations from audio and prompt pairs. We use
acoustic properties that are correlated to emotion like pitch, intensity,
speech rate, and articulation rate to automatically generate prompts
i.e. ‘acoustic prompts’. We use a contrastive learning objective to
map speech to their respective acoustic prompts. We evaluate our
model on Emotion Audio Retrieval and Speech Emotion Recogni-
tion. Our results show that the acoustic prompts significantly im-
prove the model’s performance in EAR, in various Precision@K
metrics. In SER, we observe a 3.8% relative accuracy improvement
on the Ravdess dataset.