Sorry, you need to enable JavaScript to visit this website.

The general approaches for polarity analysis in dialogue, e.g. Multiple Instance Learning (MIL), have achieved significant progress.
However, one significant drawback of current approaches is that the contribution of an utterance towards the polarity being a \emph{black-box}.
For existing methods, the polarity contained in each utterance, which we call meta-polarity, is not explicitly utilized.
In this paper, we study the problem of adding interpretability to the overall polarity by predicting the meta-polarity at the same time.

Categories:
7 Views

Accents mismatching is a critical problem for end-to-end ASR. This paper aims to address this problem by building an accent-robust RNN-T system with domain adversarial training (DAT). We unveil the magic behind DAT and provide, for the first time, a theoretical guarantee that DAT learns accent-invariant representations. We also prove that performing the gradient reversal in DAT is equivalent to minimizing the Jensen-Shannon divergence between domain output distributions.

Categories:
10 Views

The paper presents a Multi-Head Attention deep learning network for Speech Emotion Recognition (SER) using Log mel-Filter Bank Energies (LFBE) spectral features as the input. The multi-head attention along with the position embedding jointly attends to information from different representations of the same LFBE input sequence. The position embedding helps in attending to the dominant emotion features by identifying positions of the features in the sequence. In addition to Multi-Head Attention and position embedding, we apply multi-task learning with gender recognition as an auxiliary task.

Categories:
137 Views

Degradation due to additive noise is a significant road block in the real-life deployment of Speech Emotion Recognition (SER) systems. Most of the previous work in this field dealt with the noise degradation either at the signal or at the feature level. In this paper, to address the robustness aspect of the SER in additive noise scenarios, we propose multi-conditioning and data augmentation using an utterance level parametric generative noise model. The generative noise model is designed to generate noise types which can span the entire noise space in the mel-filterbank energy domain.

Categories:
85 Views

Pages