Sorry, you need to enable JavaScript to visit this website.

We describe the methodology for the collection and annotation of a large corpus of emotional speech data through crowdsourcing. The corpus offers 187 hours of data from 2,965 subjects. Data includes non-emotional recordings from each subject as well as recordings for five emotions: angry, happy-low-arousal, happy-high-arousal, neutral,

Categories:
6 Views

Being affected by mental stress during conversations might have a direct or indirect effect on our speech acoustics as well as on our physiological responses. This paper presents a study on finding the relationship between these two modalities, speech acoustics and physiology, during stressful conversations between humans. Heart rate and respiratory sinus arrhythmia have been considered as physiological variables in the present study. Two datasets, one from stress induction sessions and the other one from in-lab discussions of relationship conflicts between couples, have been analyzed.

Categories:
5 Views

In this presentation, the effects of quantisation on distributed convex optimisation algorithms are explored via the lens of monotone operator theory. Specifically, by representing transmission quantisation via an additive noise model, we demonstrate how quantisation can be viewed as an instance of an inexact Krasnoselskii-Mann scheme. In the case of two distributed solvers, the Alternating Direction Method of Multipliers and the Primal Dual Method of Multipliers, we further demonstrate how an adaptive quantisation scheme can be constructed to reduce transmission costs between nodes.

Categories:
5 Views

In this paper, we investigate the use of articulatory informa-
tion, and more specifically real time Magnetic Resonance
Imaging (rtMRI) data of the vocal tract, to improve speech
recognition performance. For the purpose of our experiments,
we use data from the rtMRI-TIMIT database. Firstly, Scale
Invariant Feature Transform (SIFT) features are extracted for
each video frame. Afterwards, the SIFT descriptors of each
frame are transformed to a single histogram per picture, by
using the Bag of Visual Words methodology. Since this kind

Categories:
12 Views

Speech recognition in digital assistants such as Google Assistant can
potentially benefit from the use of conversational context consisting of user
queries and responses from the agent. We explore the use of recurrent,
Long Short-Term Memory (LSTM), neural language models (LMs) to model the conversations
in a digital assistant. Our proposed methods effectively capture the context of
previous utterances in a conversation without modifying the underlying LSTM
architecture. We demonstrate a 4% relative improvement in recognition performance

Categories:
64 Views

We study the problem of direction of arrival estimation for arbitrary antenna
arrays. We formulate it as a continuous line spectral estimation problem and solve it under
a sparsity prior without any gridding assumptions. Moreover, we incorporate the
array's beampattern in form of the Effective Aperture Distribution Function
(EADF), which allows to use arbitrary (synthetic as well as measured) antenna
arrays. This generalizes known atomic norm based grid-free DOA estimation methods (that

Categories:
32 Views

Pages