Sorry, you need to enable JavaScript to visit this website.

ICASSP is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The 2019 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit website.

Although a WaveNet vocoder can synthesize more natural-sounding speech waveforms than conventional vocoders with sampling frequencies of 16 and 24 kHz, it is difficult to directly extend the sampling frequency to 48 kHz to cover the entire human audible frequency range for higher-quality synthesis because the model size becomes too large to train with a consumer GPU. For a WaveNet vocoder with a sampling frequency of 48 kHz with a consumer GPU, this paper introduces a subband WaveNet architecture to a speaker-dependent WaveNet vocoder and proposes a subband WaveNet vocoder.

Categories:
194 Views

Impact of online learning sequences to forecast course outcomes for an undergraduate digital signal processing (DSP) course is studied in this work. A multi-modal learning schema based on deep-learning techniques with learning sequences, psychometric measures, and personality traits as input features is developed in this work. The aim is to identify any underlying patterns in the learning sequences and subsequently forecast the learning outcomes.

Categories:
112 Views

Acquiring high-resolution hyperspectral (HS) images is a very challenging task. To this end, hyperspectral pansharpening techniques have been widely studied, which estimate an HS image of high spatial and spectral resolution (high HS image) from a pair of an HS image of high spectral resolution but low spatial resolution (low HS image) and a high spatial resolution panchromatic (PAN) image.

Categories:
19 Views

In this work, we consider the task of acoustic and articulatory feature based automatic classification of Amyotrophic Lateral Sclerosis (ALS) patients and healthy subjects using speech tasks. In particular, we compare the roles of different types of speech tasks, namely rehearsed speech, spontaneous speech and repeated words for this purpose. Simultaneous articulatory and speech data were recorded from 8 healthy controls and 8 ALS patients using AG501 for the classification experiments.

Categories:
13 Views

We measure the effect of small amounts of systematic and
random label noise caused by slightly misaligned ground truth
labels in a fine grained audio signal labeling task. The task
we choose to demonstrate these effects on is also known as
framewise polyphonic transcription or note quantized multi-
f0 estimation, and transforms a monaural audio signal into a
sequence of note indicator labels. It will be shown that even
slight misalignments have clearly apparent effects, demonstrating a great sensitivity of convolutional neural networks

Categories:
6 Views

This paper proposes an extension of multichannel non-negative matrix factorization (MNMF) that simultaneously solves source separation and dereverberation. While MNMF was originally formulated under an underdetermined problem setting where sources outnumber microphones, a determined counterpart of MNMF, which we call the determined MNMF (DMNMF), has recently been proposed with notable success.

Categories:
38 Views

Researchers have recently examined a modified approach to sparse coding that encourages dictionaries to learn anomalous features. This is done by incorporating the matrix 1-norm, or \ell_{1,\infty} mixed matrix norm, into the dictionary update portion of a sparse coding algorithm. However, solving a matrix norm minimization problem in each iteration of the algorithm

Categories:
33 Views

Pages