Sorry, you need to enable JavaScript to visit this website.

facebooktwittermailshare

Retrieving speech samples with similar emotional content using a triplet loss function

Abstract: 

The ability to identify speech with similar emotional content is valuable to many applications, including speech retrieval, surveil- lance, and emotional speech synthesis. While current formulations in speech emotion recognition based on classification or regression are not appropriate for this task, solutions based on preference learn- ing offer appealing approaches for this task. This paper aims to find speech samples that are emotionally similar to an anchor speech sample provided as a query. This novel formulation opens interest- ing research questions. How well can a machine complete this task? How does the accuracy of automatic algorithms compare to the per- formance of a human performing this task? This study addresses these questions by training a deep learning model using a triplet loss function, mapping the acoustic features into an embedding that is discriminative for this task. The network receives an anchor speech sample and two competing speech samples, and the task is to deter- mine which of the candidate speech sample conveys the closest emo- tional content to the emotion conveyed by the anchor. By compar- ing the results from our model with human perceptual evaluations, this study demonstrates that the proposed approach has performance very close to human performance in retrieving samples with similar emotional content.

up
0 users have voted:

Paper Details

Authors:
John Harvill, Mohammed AbdelWahab, Reza Lotfian, Carlos Busso
Submitted On:
20 May 2020 - 9:50am
Short Link:
Type:
Poster
Event:
Presenter's Name:
John Harvill
Document Year:
2019
Cite

Document Files

Harvill_2019-poster.pdf

(9)

Subscribe

[1] John Harvill, Mohammed AbdelWahab, Reza Lotfian, Carlos Busso, "Retrieving speech samples with similar emotional content using a triplet loss function", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5409. Accessed: Jun. 06, 2020.
@article{5409-20,
url = {http://sigport.org/5409},
author = {John Harvill; Mohammed AbdelWahab; Reza Lotfian; Carlos Busso },
publisher = {IEEE SigPort},
title = {Retrieving speech samples with similar emotional content using a triplet loss function},
year = {2020} }
TY - EJOUR
T1 - Retrieving speech samples with similar emotional content using a triplet loss function
AU - John Harvill; Mohammed AbdelWahab; Reza Lotfian; Carlos Busso
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5409
ER -
John Harvill, Mohammed AbdelWahab, Reza Lotfian, Carlos Busso. (2020). Retrieving speech samples with similar emotional content using a triplet loss function. IEEE SigPort. http://sigport.org/5409
John Harvill, Mohammed AbdelWahab, Reza Lotfian, Carlos Busso, 2020. Retrieving speech samples with similar emotional content using a triplet loss function. Available at: http://sigport.org/5409.
John Harvill, Mohammed AbdelWahab, Reza Lotfian, Carlos Busso. (2020). "Retrieving speech samples with similar emotional content using a triplet loss function." Web.
1. John Harvill, Mohammed AbdelWahab, Reza Lotfian, Carlos Busso. Retrieving speech samples with similar emotional content using a triplet loss function [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5409