Sorry, you need to enable JavaScript to visit this website.

ICASSP is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The 2019 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit website

Speech is a rich biometric signal that contains information about the identity, gender and emotional state of the speaker. In this work, we explore its potential to generate face images of a speaker by conditioning a Generative Adversarial Network (GAN) with raw speech input. We propose a deep neural network that is trained from scratch in an end-to-end fashion, generating a face directly from the raw speech waveform without any additional identity information (e.g reference image or one-hot encoding).

Categories:
100 Views

The Tweedie Compound Poisson-Gamma model is routinely used for modeling non-negative continuous data with a discrete probability mass at zero. Mixed models with random effects account for the covariance structure related to the grouping hierarchy in the data. An important application of Tweedie mixed models is pricing the insurance policies, e.g. car insurance. However, the intractable likelihood function, the unknown variance function, and the hierarchical structure of mixed effects have presented considerable challenges for drawing inferences on Tweedie.

Categories:
11 Views

Obstructive sleep apnea (OSA) is a prevalent sleep disorder, responsible for a decrease of people’s quality of life, and significant morbidity and mortality associated with hypertension and cardiovascular diseases. OSA is caused by anatomical and functional alterations in the upper airways, thus we hypothesize that the speech properties of OSA patients are altered, making it possible to detect OSA through voice analysis.

Categories:
21 Views

We propose a training scheme to train neural network-based source separation algorithms from scratch when parallel clean data is unavailable. In particular, we demonstrate that an unsupervised spatial clustering algorithm is sufficient to guide the training of a deep clustering system. We argue that previous work on deep clustering requires strong supervision and elaborate on why this is a limitation.

Categories:
43 Views

Automatic meeting analysis comprises the tasks of speaker counting, speaker diarization, and the separation of overlapped speech, followed by automatic speech recognition. This all has to be carried out on arbitrarily long sessions and, ideally, in an online or block-online manner. While significant progress has been made on individual tasks, this paper presents for the first time an all-neural approach to simultaneous speaker counting, diarization and source separation.

Categories:
15 Views

We propose a probabilistic method to track and interpret interactions of moving objects. The proposed method is based on the analysis of location data from different moving objects that modify their dynamics according to rules of interactions, namely attractive and repulsive forces governing moving objects in a scene. Our method uses a Bayesian structure to identify key elements of the interplay rules and facilitates the prediction of objects' dynamics as the interacting system.

Categories:
14 Views

Pages