Sorry, you need to enable JavaScript to visit this website.

ICASSP is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The 2019 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit website.

Singing voice separation based on deep learning relies on the usage of time-frequency masking. In many cases the masking process is not a learnable function or is not encapsulated into the deep learning optimization. Consequently, most of the existing methods rely on a post processing step using the generalized Wiener filtering. This work proposes a method that learns and optimizes (during training) a source-dependent mask and does not need the aforementioned post processing step.

Categories:
5 Views

A novel interpretable end-to-end learning scheme for language identification is proposed. It is in line with the classical GMM i-vector methods both theoretically and practically. In the end-to-end pipeline, a general encoding layer is employed on top of the front-end CNN, so that it can encode the variable-length input sequence into an utterance level vector automatically. After comparing with the state-of-the-art GMM i-vector methods, we give insights into CNN, and reveal its role and effect in the whole pipeline.

Categories:
22 Views

In this paper, we address the problem of distributed state estimation, where a set of nodes are required to jointly estimate the state of a linear dynamic system based on sequential measurements. In our distributed scenario, all the nodes 1) are interested in the full state of the observed system and 2) pursue a consensus-based state estimate with high accuracy. We exploit the equivalent relation between the maximum-a-posteriori (MAP) estimation and the Kalman filter (KF) in the minimum mean square error (MMSE) sense under the Gaussian assumption.

Categories:
38 Views

Unsupervised cross-database facial expression recognition(FER) is a challenging problem, in which the training and testing samples belong to different facial expression databases. For this reason, the training (source) and testing (target) facial expression samples would have different feature distributions and hence the performance of lots of existing FER methods may decrease.

Categories:
8 Views

This paper addresses structured covariance matrix estimation under t-distribution. Covariance matrices frequently reveal a particular structure due to the considered application and taking into account this structure usually improves estimation accuracy. In the framework of robust estimation, the $t$-distribution is particularly suited to describe heavy-tailed observation. In this context, we propose an efficient estimation procedure for covariance matrices with convex structure under t-distribution.

V1.pdf

PDF icon V1.pdf (335)
Categories:
5 Views

In this paper, we investigate an interesting problem, i.e., unsupervised cross-corpus speech emotion recognition (SER), in which the training and testing speech signals come from two different speech emotion corpora. Meanwhile, the training speech signals are labeled, while the label information of the testing speech signals is entirely unknown. Due to this setting, the training (source) and testing (target) speech signals may have different feature distributions and therefore lots of existing SER methods would not work.

Categories:
12 Views

Pages