Speech Emotion Recognition

A Novel Sequential Monte Carlo Framework For Predicting Ambiguous Emotion State (poster)

icassp2022_poster_ready.pdf

icassp2022_poster_ready.pdf (323)

Categories:: Other

29 Views

A Novel Sequential Monte Carlo Framework For Predicting Ambiguous Emotion State (presentation slides)

icassp_oral_pre_new_final2.pdf

icassp_oral_pre_new_final2.pdf (251)

Categories:: Other

22 Views

Light-SERNet: A Lightweight Fully Convolutional Neural Network for Speech Emotion Recognition

Detecting emotions directly from a speech signal plays an important role in effective human-computer interactions. Existing speech emotion recognition models require massive computational and storage resources, making them hard to implement concurrently with other machine-interactive tasks in embedded systems. In this paper, we propose an efficient and lightweight fully convolutional neural network (FCNN) for speech emotion recognition in systems with limited hardware resources.

Light-SERNet_ICASSP2022.pdf

Light-SERNet_ICASSP2022.pdf (387)

Categories:: Other

36 Views

Compact Graph Architecture for Speech Emotion Recognition

Read more about Compact Graph Architecture for Speech Emotion Recognition
Log in to post comments

We propose a deep graph approach to address the task of speech emotion recognition. A compact, efficient and scalable way to represent data is in the form of graphs. Following the theory of graph signal processing, we propose to model speech signal as a cycle graph or a line graph. Such graph structure enables us to construct a Graph Convolution Network (GCN)-based architecture that can perform an accurate graph convolution in contrast to the approximate convolution used in standard GCNs.

1623_COMPACT GRAPH ARCHITECTURE FOR SPEECH EMOTION RECOGNITION.pdf

1623_COMPACT GRAPH ARCHITECTURE FOR SPEECH EMOTION RECOGNITION.pdf (433)

Categories:: Content-Based Audio Processing

36 Views

MULTI-HEAD ATTENTION FOR SPEECH EMOTION RECOGNITION WITH AUXILIARY LEARNING OF GENDER RECOGNITION

The paper presents a Multi-Head Attention deep learning network for Speech Emotion Recognition (SER) using Log mel-Filter Bank Energies (LFBE) spectral features as the input. The multi-head attention along with the position embedding jointly attends to information from different representations of the same LFBE input sequence. The position embedding helps in attending to the dominant emotion features by identifying positions of the features in the sequence. In addition to Multi-Head Attention and position embedding, we apply multi-task learning with gender recognition as an auxiliary task.

ICASSP.pdf

ICASSP.pdf (668)

Categories:: Speech Processing

147 Views

DEEP ENCODED LINGUISTIC AND ACOUSTIC CUES FOR ATTENTION BASED END TO END SPEECH EMOTION RECOGNITION

ICASSP_presentation_5598.pdf

ICASSP_presentation_5598.pdf (606)

Categories:: Speech Processing

74 Views

Multi-Conditioning & Data Augmentation using Generative Noise Model for Speech Emotion Recognition in Noisy Conditions

Degradation due to additive noise is a significant road block in the real-life deployment of Speech Emotion Recognition (SER) systems. Most of the previous work in this field dealt with the noise degradation either at the signal or at the feature level. In this paper, to address the robustness aspect of the SER in additive noise scenarios, we propose multi-conditioning and data augmentation using an utterance level parametric generative noise model. The generative noise model is designed to generate noise types which can span the entire noise space in the mel-filterbank energy domain.

ICASSP2020_ppt_5701.pdf

ICASSP2020_ppt_5701.pdf (759)

Categories:: Speech Processing

96 Views

UNSUPERVISED LEARNING APPROACH TO FEATURE ANALYSIS FOR AUTOMATIC SPEECH EMOTION RECOGNITION

The scarcity of emotional speech data is a bottleneck of developing automatic speech emotion recognition (ASER) systems. One way to alleviate this issue is to use unsupervised feature learning techniques to learn features from the widely available general speech and use these features to train emotion classifiers. These unsupervised methods, such as denoising autoencoder (DAE), variational autoencoder (VAE), adversarial autoencoder (AAE) and adversarial variational Bayes (AVB), can capture the intrinsic structure of the data distribution in the learned feature representation.

icassp-2018-poster.pdf

icassp-2018-poster.pdf (621)

Categories:: Audio and Acoustic Signal Processing

68 Views