ICASSP 2019

ICASSP is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The 2019 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit website.

Enhanced Recurrent Neural Network for Combining Static and Dynamic Features for Credit Card Default Prediction

Deep learning models have been shown to be capable of extracting high-level representations from the increasing amount of customer-level data generated via fast-growing financial activities. In financial data, dynamic features that evolve with time are commonly observed. However, such time dependencies are often ignored in classical classification models. In this study, we propose to learn a Recurrent Neural Network (RNN) feature extractor with GRU on credit card payment history to leverage the time dependencies embedded in these dynamic features.

poster_new_0415.pdf

poster_new_0415.pdf (465)

Categories:: Design and Implementation of Signal Processing Systems

135 Views

INCREMENTAL TRANSFER LEARNING IN TWO-PASS INFORMATION BOTTLENECK BASED SPEAKER DIARIZATION SYSTEM FOR MEETINGS

The two-pass information bottleneck (TPIB) based speaker diarization system operates independently on different conversational recordings. TPIB system does not consider previously learned speaker discriminative information while diarizing new conversations. Hence, the real time factor (RTF) of TPIB system is high owing to the training time required for the artificial neural network (ANN).

icassp2019_sd_poster.pdf

icassp2019_sd_poster.pdf (392)

Categories:: Audio and Acoustic Signal Processing

9 Views

Using recurrences in time and frequency within U-net architecture for speech enhancement

When designing fully-convolutional neural network, there is a trade-off between receptive field size, number of parameters and spatial resolution of features in deeper layers of the network. In this work we present a novel network design based on combination of many convolutional and recurrent layers that solves these dilemmas. We compare our solution with U-nets based models known from the literature and other baseline models on speech enhancement task.

Grzywalski_Drgas.pdf

Grzywalski_Drgas.pdf (373)

Categories:: Speech Enhancement (SPE-ENHA)

35 Views

MULTI-SCALE SPATIAL-TEMPORAL NETWORK FOR PERSON RE-IDENTIFICATION

Read more about MULTI-SCALE SPATIAL-TEMPORAL NETWORK FOR PERSON RE-IDENTIFICATION
Log in to post comments

last_version1.pdf

paper (373)

Categories:: Image/Video Storage, Retrieval

30 Views

Deep Learning Features for Robust Detection of Acoustic Events in Sleep-Disordered Breathing

Sleep-disordered breathing (SDB) is a serious and prevalent condition, and acoustic analysis via consumer devices (e.g. smartphones) offers a low-cost solution to screening for it. We present a novel approach for the acoustic identification of SDB sounds, such as snoring, using bottleneck features learned from a corpus of whole-night sound recordings. Two types of bottleneck features are described, obtained by applying a deep autoencoder to the output of an auditory model or a short-term autocorrelation analysis.

icassp2019-poster.pdf

poster (548)

Categories:: Bioacoustics and Medical Acoustics

15 Views

Langevin-based Strategy for Efficient Proposal Adaptation in Population Monte Carlo

Read more about Langevin-based Strategy for Efficient Proposal Adaptation in Population Monte Carlo
Log in to post comments

Population Monte Carlo (PMC) algorithms are a family of
adaptive importance sampling (AIS) methods for approximating
integrals in Bayesian inference. In this paper, we propose
a novel PMC algorithm that combines recent advances
in the AIS and the optimization literatures. In such a way, the
proposal densities are adapted according to the past weighted
samples via a local resampling that preserves the diversity,
but we also exploit the geometry of the targeted distribution.
A scaled Langevin strategy with Newton-based scaling metric

poster_ICASSP_2019.pdf

poster_ICASSP_2019.pdf (417)

Categories:: Statistical Signal Processing

21 Views

3D VISUAL SPEECH ANIMATION USING 2D VIDEOS

Read more about 3D VISUAL SPEECH ANIMATION USING 2D VIDEOS
Log in to post comments

In visual speech animation, lip motion accuracy is of paramount importance for speech intelligibility, especially for the hard of hearing or foreign language learners. We present an approach for visual speech animation that uses tracked lip motion in front-view 2D videos of a real speaker to drive the lip motion of a synthetic 3D head. This makes use of a 3D morphable model (3DMM), built using 3D synthetic head poses, with corresponding landmarks identified in the 2D videos and the 3DMM.

3D Visual Speech Animation Using 2D Videos.pdf

3D Visual Speech Animation Using 2D Videos.pdf (418)

Categories:: Image/Video Processing

13 Views

CUTENSOR-TUBAL: OPTIMIZED GPU LIBRARY FOR LOW-TUBAL-RANK TENSORS

Read more about CUTENSOR-TUBAL: OPTIMIZED GPU LIBRARY FOR LOW-TUBAL-RANK TENSORS
Log in to post comments

In this paper, we optimize the computations of third-order low-tubal-rank tensor operations on many-core GPUs. Tensor operations are compute-intensive and existing studies optimize such operations in a case-by-case manner, which can be inefficient and error-prone. We develop and optimize a BLAS-like library for the low-tubal-rank tensor model called cuTensor-tubal, which includes efficient GPU primitives for tensor operations and key processes. We compute tensor operations in the frequency domain and fully exploit tube-wise and slice-wise parallelisms.

ICASSP_poster_taozhang V1 final.pdf

The poster for paper entitled "CUTENSOR-TUBAL: OPTIMIZED GPU LIBRARY FOR LOW-TUBAL-RANK TENSORS" (543)

Categories:: Image, Video, and Multidimensional Signal Processing

32 Views

NOVEL METRIC LEARNING FOR NON-PARALLEL VOICE CONVERSION

Read more about NOVEL METRIC LEARNING FOR NON-PARALLEL VOICE CONVERSION
Log in to post comments

Obtaining aligned spectral pairs in case of non-parallel data for stand-alone Voice Conversion (VC) technique is a challenging research problem. Unsupervised alignment algorithm, namely, an Iterative combination of a Nearest Neighbor search step and a Conversion step Alignment (INCA) iteratively tries to align the spectral features by minimizing the Euclidean distance metric between the intermediate converted and the target spectral feature vectors.

main.pdf

main.pdf (365)

Categories:: Machine Learning for Signal Processing

22 Views

LEARNING TEMPORAL INFORMATION FROM SPATIAL INFORMATION USING CAPSNETS FOR HUMAN ACTION RECOGNITION

Capsule Networks (CapsNets) are recently introduced to overcome some of the shortcomings of traditional Convolutional Neural Networks (CNNs). CapsNets replace neurons in CNNs with vectors to retain spatial relationships among the features. In this paper, we propose a CapsNet architecture that employs individual video frames for human action recognition without explicitly extracting motion information. We also propose weight pooling to reduce the computational complexity and improve the classification accuracy by appropriately removing some of the extracted features.

ICASSP_poster_2019__1_ (2).pdf

ICASSP_poster_2019__1_ (2).pdf (485)

Categories:: Neural network learning (MLR-NNLR)

39 Views

Pages