Sorry, you need to enable JavaScript to visit this website.

ICASSP is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The 2019 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit website.

This article proposes an automatic approach - based on nonverbal speech features - aimed at the automatic discrimination between depressed and non-depressed speakers. The experiments have been performed over one of the largest corpora collected for such a task in the literature ($62$ patients diagnosed with depression and $54$ healthy control subjects), especially when it comes to data where the depressed speakers have been diagnosed as such by professional psychiatrists.

Categories:
62 Views

We propose a novel latent variable model for learning latent bases for time-varying non-negative data. Our model uses a mixture multinomial as the likelihood function and proposes a Dirichlet distribution with dynamic parameters as a prior, which we call the dynamic Dirichlet prior. An expectation maximization (EM) algorithm is developed for estimating the parameters of the proposed model.

Categories:
27 Views

End-to-end systems using deep neural networks have been widely studied in the field of speaker verification. Raw audio signal processing has also been widely studied in the fields of automatic music tagging and speech recognition. However, as far as we know, end-to-end systems using raw audio signals have not been explored in speaker verification. In this paper, a complete end-to-end speaker verification system is proposed, which inputs raw audio signals and outputs the verification results. A pre-processing layer and the embedded speaker feature extraction models were mainly investigated.

Categories:
125 Views

Robust speech processing in multi-talker environments requires effective speech separation. Recent deep learning systems have made significant progress toward solving this problem, yet it remains challenging particularly in real-time, short latency applications. Most methods attempt to construct a mask for each source in time-frequency representation of the mixture signal which is not necessarily an optimal representation for speech separation.

Categories:
70 Views

Self-attention -- an attention mechanism where the input and output
sequence lengths are the same -- has
recently been successfully applied to machine translation, caption generation, and phoneme recognition.
In this paper we apply a restricted self-attention mechanism (with
multiple heads) to speech recognition. By ``restricted'' we
mean that the mechanism at a particular frame only sees input from a
limited number of frames to
the left and right. Restricting the context makes it easier to

Categories:
150 Views

We introduce the notion of Quality of Indicator (QoI) to assess the level of contribution by participants in threat intelligence sharing. We exemplify QoI by metrics of the correctness, relevance, utility, and uniqueness of indicators. We build a system that extrapolates the metrics using a machine learning process over a reference set of indicators.

Categories:
6 Views

Pages