ICASSP 2018

ICASSP is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The 2019 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit website.

An investigation of subband WaveNet vocoder covering entire audible frequency range with limited acoustic features

Although a WaveNet vocoder can synthesize more natural-sounding speech waveforms than conventional vocoders with sampling frequencies of 16 and 24 kHz, it is difficult to directly extend the sampling frequency to 48 kHz to cover the entire human audible frequency range for higher-quality synthesis because the model size becomes too large to train with a consumer GPU. For a WaveNet vocoder with a sampling frequency of 48 kHz with a consumer GPU, this paper introduces a subband WaveNet architecture to a speaker-dependent WaveNet vocoder and proposes a subband WaveNet vocoder.

ICASSP_2018_subband_WaveNet_vocoder.pdf

ICASSP_2018_subband_WaveNet_vocoder.pdf (1026)

Categories:: Speech Synthesis and Generation, including TTS (SPE-SYNT)

223 Views

ONLINE EDUCATION EVALUATION FOR SIGNAL PROCESSING COURSE THROUGH STUDENT LEARNING PATHWAYS

Impact of online learning sequences to forecast course outcomes for an undergraduate digital signal processing (DSP) course is studied in this work. A multi-modal learning schema based on deep-learning techniques with learning sequences, psychometric measures, and personality traits as input features is developed in this work. The aim is to identify any underlying patterns in the learning sequences and subsequently forecast the learning outcomes.

icassp_v3.pptx

icassp_v3.pptx (826)

Categories:: Signal Processing Education

132 Views

ROBUST AND EFFECTIVE HYPERSPECTRAL PANSHARPENING USING SPATIO-SPECTRAL TOTAL VARIATION

Acquiring high-resolution hyperspectral (HS) images is a very challenging task. To this end, hyperspectral pansharpening techniques have been widely studied, which estimate an HS image of high spatial and spectral resolution (high HS image) from a pair of an HS image of high spectral resolution but low spatial resolution (low HS image) and a high spatial resolution panchromatic (PAN) image.

ICASSP2018_poster.pdf

ICASSP2018_poster.pdf (116)

Categories:: Image Formation

29 Views

Comparison of speech tasks for automatic classification of patients with amyotrophic lateral sclerosis and healthy subjects

In this work, we consider the task of acoustic and articulatory feature based automatic classification of Amyotrophic Lateral Sclerosis (ALS) patients and healthy subjects using speech tasks. In particular, we compare the roles of different types of speech tasks, namely rehearsed speech, spontaneous speech and repeated words for this purpose. Simultaneous articulatory and speech data were recorded from 8 healthy controls and 8 ALS patients using AG501 for the classification experiments.

ICASSP_Final_April21.pdf

ICASSP_Final_April21.pdf (703)

Categories:: Speech Processing

15 Views

INVESTIGATING LABEL NOISE SENSITIVITY OF CONVOLUTIONAL NEURAL NETWORKS FOR FINE GRAINED AUDIO SIGNAL LABELLING

We measure the effect of small amounts of systematic and
random label noise caused by slightly misaligned ground truth
labels in a fine grained audio signal labeling task. The task
we choose to demonstrate these effects on is also known as
framewise polyphonic transcription or note quantized multi-
f0 estimation, and transforms a monaural audio signal into a
sequence of note indicator labels. It will be shown that even
slight misalignments have clearly apparent effects, demonstrating a great sensitivity of convolutional neural networks

large_poster_96x48.pdf

large_poster_96x48.pdf (613)

Categories:: Neural network learning (MLR-NNLR)

8 Views

AUTOMATIC CONFLICT DETECTION IN POLICE BODY-WORN AUDIO

Read more about AUTOMATIC CONFLICT DETECTION IN POLICE BODY-WORN AUDIO
Log in to post comments

icassp_poster.pdf

icassp_poster.pdf (636)

Categories:: Machine Learning for Signal Processing

20 Views

MULTI-DIALECT SPEECH RECOGNITION WITH A SINGLE SEQUENCE-TO-SEQUENCE MODEL

Read more about MULTI-DIALECT SPEECH RECOGNITION WITH A SINGLE SEQUENCE-TO-SEQUENCE MODEL
Log in to post comments

Tue-SP-L1.1-MultiDialect LAS.pdf

MultiDialect LAS (564)

Categories:: Audio and Acoustic Signal Processing

47 Views

Joint Separation and Dereverberation of Reverberant Mixtures with Determined Multichannel Non-negative Matrix Factorization

This paper proposes an extension of multichannel non-negative matrix factorization (MNMF) that simultaneously solves source separation and dereverberation. While MNMF was originally formulated under an underdetermined problem setting where sources outnumber microphones, a determined counterpart of MNMF, which we call the determined MNMF (DMNMF), has recently been proposed with notable success.

Kagami2018ICASSP03.pdf

Kagami2018ICASSP03.pdf (568)

Categories:: Source Separation and Signal Enhancement

45 Views

Using Block Coordinate Descent to Learn Sparse Coding Dictionaries with a Matrix Norm Update

Researchers have recently examined a modified approach to sparse coding that encourages dictionaries to learn anomalous features. This is done by incorporating the matrix 1-norm, or \ell_{1,\infty} mixed matrix norm, into the dictionary update portion of a sparse coding algorithm. However, solving a matrix norm minimization problem in each iteration of the algorithm