ICASSP 2019

ICASSP is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The 2019 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit website.

ONLINE SINGING VOICE SEPARATION USING A RECURRENT ONE-DIMENSIONAL U-NET TRAINED WITH DEEP FEATURE LOSSES

This paper proposes an online approach to the singing voice separation problem. Based on a combination of one-dimensional convolutional layers along the frequency axis and recurrent layers to enforce temporal coherency, state-of-the-art performance is achieved. The concept of using deep features in the loss function to guide training and improve the model’s performance is also investigated.

poster.pdf

Poster presentation OR-U-Net (451)

Categories:: Applications in Music and Audio Processing (MLR-MUSI)

47 Views

Efficient Nonlinear Acoustic Echo Cancellation by Dual-stage Multi-channel Kalman Filtering

Mobile devices for hands-free speech communication often show significant nonlinear distortion in the sound emitted by their loudspeakers. Therefore, conventional linear echo cancellation is not sufficient for maintaining a high conversation quality. In this work we propose a nonlinear echo canceller that uses two serially cascaded adaptive filters to compensate for the nonlinear and linear echo. We show that a stable operation of the cascaded structure is achieved by using the multi-channel Kalman algorithm in the frequency domain with filtered-x references.

posterICASSP19_DualStageNAEC_Schrammen.pdf

posterICASSP19_DualStageNAEC_Schrammen.pdf (562)

Categories:: Echo Cancellation

104 Views

eep Latent Factor Model for Predicting Drug Target Interactions

Read more about eep Latent Factor Model for Predicting Drug Target Interactions
Log in to post comments

In drug target interaction (DTI) the interactions of some (a subset) drugs on some (a subset) targets are known. The goal is to predict the interactions of all drugs on all targets. One approach is to formulate this as a matrix completion problem, where the matrix of interactions having drugs along the rows and targets along the columns is partially filled. So far standard matrix completion approaches such as nuclear norm minimization and matrix factorization have been used to address the problem.

lfm_dti (7).pdf

lfm_dti (7).pdf (490)

Categories:: Bioinformatics

15 Views

A Subband Energy Modification Method for Elevation Control in Median Plane

Read more about A Subband Energy Modification Method for Elevation Control in Median Plane
Log in to post comments

Elevation perception is crucial for binaural reproduction. A recent study proposed an elevation control method by modifying the energy of HRTFs in each auditory scale subband, such as the ERB and Mel subband. However, this subband division is designed based on auditory excitation patterns and may not be consistent with the elevation localization cues. To this end, this study proposes a novel subband division strategy which emphasizes the physiological information involved in elevation localization based on a statistical analysis of the HRTF.

Icassp2019_poster_ydd_e1.pdf

Icassp2019_poster_ydd_e1.pdf (412)

Categories:: Spatial and Multichannel Audio

15 Views

Guided-spatio-temporal filtering for extracting sound from optically measured images containing occluding objects

Recent development of optical interferometry enables us to measure sound without placing any device inside the sound field. In particular, parallel phase-shifting interferometry (PPSI) has realized advanced measurement of refractive index of air. Its novel application investigated very recently is simultaneous visualization of flow and sound, which had been difficult until PPSI enabled high-speed and accurate measurement several years ago. However, for understanding aerodynamic sound, separation of air flow and sound is necessary since they are mixed up in the observed video.

icassp2019_tanigawa.pdf

icassp2019_tanigawa.pdf (410)

Categories:: Spatial and Multichannel Audio

18 Views

Learning Motion Disfluencies for Automatic Sign Language Segmentation

Read more about Learning Motion Disfluencies for Automatic Sign Language Segmentation
Log in to post comments

We introduce a novel technique for the automatic detection of word boundaries within continuous sentence expressions in Japanese Sign Language from three-dimensional body joint positions. First, the flow of signed sentence data within a temporal neighborhood is determined utilizing the spatial correlations between line segments of inter-joint pairs. Next, a frame-wise binary random forest classifier is trained to distinguish word and non-word frame content based on the extracted spatio-temporal features.

Poster.pdf

Poster.pdf (582)

Categories:: Pattern recognition and classification (MLR-PATT)
Image, Video, and Multidimensional Signal Processing

41 Views

REVISITING HIDDEN MARKOV MODELS FOR SPEECH EMOTION RECOGNITION

Read more about REVISITING HIDDEN MARKOV MODELS FOR SPEECH EMOTION RECOGNITION
Log in to post comments

ICASSP2019_Poster_symao.pdf

ICASSP2019_Poster_symao.pdf (416)

Categories:: Spoken Language Understanding (SLP-UNDE)

35 Views

OPTIMIZED COLOR-GUIDED FILTER FOR DEPTH IMAGE DENOISING

Read more about OPTIMIZED COLOR-GUIDED FILTER FOR DEPTH IMAGE DENOISING
Log in to post comments

Color Guided Depth image denoising often suffers from the texture coping from the color image as well as the blurry effect at the depth discontinuities. Motivated by this, we propose an optimized color-guided filter for depth image denoising from different types of noises. This is a new framework that helps to mitigate the texture coping and enhance the depth discontinuities, especially in heavy noises. This framework consists of two parts namely depth driven color flattening model and patch synthesis-based Markov random field model.

ICASSP_Poster.pdf

ICASSP_Poster.pdf (726)

Categories:: Image/Video Processing

15 Views

ANOMALY IMAGING FOR STRUCTURAL HEALTH MONITORING EXPLOITING CLUSTERED SPARSITY

Read more about ANOMALY IMAGING FOR STRUCTURAL HEALTH MONITORING EXPLOITING CLUSTERED SPARSITY
Log in to post comments

We present a new tomography-based anomaly mapping algorithm for composite structures. The system consists of an array of piezoelectric transducers which sequentially excites the structure and collects the resulting waveform at the remaining transducers. Anomaly indices computed from the sensor waveforms are fed as input to the mapping algorithm. The output of the algorithm is a color map indicating the outline of damage on the structure when present.

Poster_ICASSP_PCSBL.pdf

Poster_ICASSP_PCSBL.pdf (426)

Categories:: Sampling and Reconstruction

9 Views

SPEAKER AGNOSTIC FOREGROUND SPEECH DETECTION FROM AUDIO RECORDINGS  IN WORKPLACE SETTINGS FROM WEARABLE RECORDERS 

Audio-signal acquisition as part of wearable sensing adds an important dimension for applications such as understanding human behaviors. As part of a large study on work place behaviors, we collected audio data from individual hospital staff using custom wearable recorders. The audio features collected were limited to preserve privacy of the interactions in the hospital. A first step towards audio processing is to identify the foreground speech of the person wearing the audio badge.

ICASSP 2019 poster 34*26in new.pdf

SPEAKER AGNOSTIC FOREGROUND SPEECH DETECTION FROM AUDIO RECORDINGS  IN WORKPLACE SETTINGS FROM WEARABLE RECORDERS  (544)

Categories:: Audio and Acoustic Signal Processing
Speech Processing

29 Views

Pages