Sorry, you need to enable JavaScript to visit this website.

ICASSP is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The 2019 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit website

Spherical cameras, which can acquire all-round information, are effective to estimate rotation for robotic applications. Recently, Convolutional Neural Networks have shown great robustness in solving such regression problems. However they are designed for planar images and cannot deal with the non-uniform distortion present in spherical images, when expressed in the planar equirectangular projection. This can lower the accuracy of motion estimation. In this research, we propose an Equirectangular-Convolutional Neural Network (E-CNN) to solve this issue.

Categories:
49 Views

Pairwise, or separable, dictionaries are suited for the sparse representation of 2D signals in their original form, without vectorization. They are equivalent with enforcing a Kronecker structure on a standard dictionary for 1D signals. We present a dictionary learning algorithm, in the coordinate descent style of Approximate K-SVD, for such dictionaries. The algorithm has the benefit of extremely low complexity, clearly lower than that of existing algorithms.

Categories:
16 Views

Multimodal data fusion is an important aspect of many object localization and tracking frameworks that rely on sensory observations from different sources. A prominent example is audiovisual speaker localization, where the incorporation of visual information has shown to benefit overall performance, especially in adverse acoustic conditions. Recently, the notion of dynamic stream weights as an efficient data fusion technique has been introduced into this field.

Categories:
11 Views

CT image reconstruction from incomplete data, such as sparse views and limited angle reconstruction, is an important and challenging problem in medical imaging. This work proposes a new deep convolutional neural network (CNN), called JSR-Net, that jointly reconstructs CT images and their associated Radon domain projections. JSR-Net combines the traditional model-based approach with deep architecture design of deep learning. A hybrid loss function is adapted to improve the performance of the JSR-Net making it more effective in protecting important image structures.

Categories:
16 Views

This paper examines four approaches to improving real-time neural vocoders with simple acoustic features (SAF) constructed from fundamental frequency and mel-cepstra rather than mel-spectrograms.

Categories:
335 Views

Although 2.5D sound field synthesis with a circular loudspeaker array can be used in a 3D sound field, a 2D sound field, instead of a 3D sound field, is assumed for a sound field recording with a circular microphone array. This paper presents a horizontal 3D sound field recording and 2.5D synthesis method used in 3D sound fields with multiple co-centered omni-directional circular microphone arrays and a circular loudspeaker array without vertical derivative measurements.

Categories:
187 Views

Non-parallel voice conversion (VC) is a technique for learning the mapping from source to target speech without relying on parallel data. This is an important task, but it has been challenging due to the disadvantages of the training conditions. Recently, CycleGAN-VC has provided a breakthrough and performed comparably to a parallel VC method without relying on any extra data, modules, or time alignment procedures. However, there is still a large gap between the real target and converted speech, and bridging this gap remains a challenge.

Categories:
63 Views

Recent work on acoustic parameter estimation indicates that geometric room volume can be useful for modeling the character of an acoustic environment. However, estimating volume from audio signals remains a challenging problem. Here we propose using a convolutional neural network model to estimate the room volume blindly from reverberant single-channel speech signals in the presence of noise. The model is shown to produce estimates within approximately a factor of two to the true value, for rooms ranging in size from small offices to large concert halls.

Categories:
22 Views

Wavelet analysis and perfect reconstruction filterbanks (PRFBs) are closely related. Desired properties on the wavelet could be translated to equivalent properties on a PRFB. We propose a new learning-based approach towards designing compactly supported orthonormal wavelets with a specified number of vanishing moments. We view PRFBs as a special class of convolutional autoencoders, which places the problem of wavelet/PRFB design within a learning framework. One could then deploy several state-of-the-art deep learning tools to solve the design problem.

Categories:
23 Views

Pages