Image, Video, and Multidimensional Signal Processing

Deep-URL

Read more about Deep-URL
Log in to post comments

The lack of interpretability in current deep learning models causes serious concerns as they are extensively used for various life-critical applications. Hence, it is of paramount importance to develop interpretable deep learning models. In this paper, we consider the problem of blind deconvolution and propose a novel model-aware deep architecture that allows for the recovery of both the blur kernel and the sharp image from the blurred image.

ICIP_2020_video_presentation.pptx

The final presentation for the video presented at ICIP 2020 (407)

Categories:: Image, Video, and Multidimensional Signal Processing

43 Views

MULTI IMAGE DEPTH FROM DEFOCUS NETWORK WITH BOUNDARY CUE FOR DUAL APERTURE CAMERA

Read more about MULTI IMAGE DEPTH FROM DEFOCUS NETWORK WITH BOUNDARY CUE FOR DUAL APERTURE CAMERA
Log in to post comments

In this paper, we estimate depth information using two defocused images from dual aperture camera. Recent advances in deep learning techniques have increased the accuracy of depth estimation. Besides, methods of using a defocused image in which an object is blurred according to a distance from a camera have been widely studied. We further improve the accuracy of the depth estimation by training the network using two images with different degrees of depth-of-field.

ICASSP_MIDFD_PPT.pdf

ICASSP_MIDFD_PPT.pdf (382)

Categories:: Image, Video, and Multidimensional Signal Processing

32 Views

IMPROVING THE PERFORMANCE OF TRANSFORMER BASED LOW RESOURCE SPEECH RECOGNITION FOR INDIAN LANGUAGES

The recent success of the Transformer based sequence-to-sequence framework for various Natural Language Processing tasks has motivated its application to Automatic Speech Recognition. In this work, we explore the application of Transformers on low resource Indian languages in a multilingual framework. We explore various methods to incorporate language information into a multilingual Transformer, i.e.,(i) at the decoder, (ii) at the encoder. These methods include using language identity tokens or providing language information to the acoustic vectors.

shetty.pdf

shetty.pdf (400)

Categories:: Image, Video, and Multidimensional Signal Processing

54 Views

INTERPRETABLE SELF-ATTENTION TEMPORAL REASONING FOR DRIVING BEHAVIOR UNDERSTANDING

Read more about INTERPRETABLE SELF-ATTENTION TEMPORAL REASONING FOR DRIVING BEHAVIOR UNDERSTANDING
Log in to post comments

Performing driving behaviors based on causal reasoning is essential to ensure driving safety. In this work, we investigated how state-of-the-art 3D Convolutional Neural Networks (CNNs) perform on classifying driving behaviors based on causal reasoning. We proposed a perturbation-based visual explanation method to inspect the models' performance visually. By examining the video attention saliency, we found that existing models could not precisely capture the causes (e.g., traffic light) of the specific action (e.g., stopping).

INTERPRETABLE SELF-ATTENTION TEMPORAL REASONING FOR DRIVING BEHAVIOR UNDERSTANDING.pdf

INTERPRETABLE SELF-ATTENTION TEMPORAL REASONING FOR DRIVING BEHAVIOR UNDERSTANDING.pdf (502)

Categories:: Image, Video, and Multidimensional Signal Processing

39 Views

Parsing Map Guided Multi-Scale Attention Network For Face Hallucination

Read more about Parsing Map Guided Multi-Scale Attention Network For Face Hallucination
Log in to post comments

4929wang.pdf

4929wang.pdf (432)

Categories:: Image, Video, and Multidimensional Signal Processing

34 Views

IMAGE SEGMENTATION BASED PRIVACY-PRESERVING HUMAN ACTION RECOGNITION FOR ANOMALY DETECTION

Image Segmentation Based Privacy-Preserving Human Action Recognition for Anomaly Detection .pdf

Image Segmentation Based Privacy-Preserving Human Action Recognition for Anomaly Detection .pdf (563)

Categories:: Image, Video, and Multidimensional Signal Processing

57 Views

COLOUR COMPRESSION OF PLENOPTIC POINT CLOUDS USING RAHT-KLT WITH PRIOR COLOUR CLUSTERING AND SPECULAR/DIFFUSE COMPONENT SEPARATION

The recently introduced plenoptic point cloud representation marries a 3D point cloud with a light field. Instead of each point being associated with a single colour value, there can be multiple values to represent the colour at that point as perceived from different viewpoints. This representation was introduced together with a compression technique for the multi-view colour vectors, which is an extension of the RAHT method for point cloud attribute coding.

1461_Krivokuca_Presentation.pdf

1461_Krivokuca_Presentation.pdf (600)

Categories:: Image, Video, and Multidimensional Signal Processing

41 Views

Single-Shot Real-Time Multiple-Path Time-of-Flight Depth Imaging for Multi-Aperture and Macro-Pixel Sensors

Multiple-Path Interference (MPI) is a major drawback of Time-of-Flight (ToF) sensors. MPI occurs when a ToF pixel receives more than a single light bounce from the scene. Current methods resolving more than a single return per pixel rely on the sequential acquisition of large amounts of data and are too computationally expensive to deliver depth images in real time. These factors have precluded the development of a multiple-path ToF camera to date. In this work we consider two hardware alternatives that can be used to acquire all necessary raw data in a single shot.

heredia_ICASSP2020_kagawa_v3_final.pdf

heredia_ICASSP2020_kagawa_v3_final.pdf (643)

Categories:: Image, Video, and Multidimensional Signal Processing

124 Views

COMPARE LEARNING: BI-ATTENTION NETWORK FOR FEW-SHOT LEARNING

Read more about COMPARE LEARNING: BI-ATTENTION NETWORK FOR FEW-SHOT LEARNING
Log in to post comments

Learning with few labeled data is a key challenge for visual recognition, as deep neural networks tend to overfit using a few samples only. One of the Few-shot learning methods called metric learning addresses this challenge by first learning a deep distance metric to determine whether a pair of images belong to the same category, then applying the trained metric to instances from other test set with limited labels. This method makes the most of the few samples and limits the overfitting effectively.

ICASSP2020 presentation.pdf

ICASSP2020 presentation.pdf (477)

Categories:: Image, Video, and Multidimensional Signal Processing

63 Views

EXPOSURE INTERPOLATION VIA HYBRID LEARNING

Read more about EXPOSURE INTERPOLATION VIA HYBRID LEARNING
Log in to post comments

Deep learning based methods have become dominant solutions to many image processing problems. A natural question would be “Is there any space for conventional methods on these problems?” In this paper, exposure interpolation is taken as an example to answer this question and the answer is “Yes”. A new hybrid learning framework is introduced to interpolate a medium exposure image for two large-exposure-ratio images from an emerging high dynamic range (HDR) video capturing device. The framework is set up by fusing conventional and deep learning methods.

ICASSP2020HybridLearning.pdf

ICASSP2020HybridLearning.pdf (390)

Categories:: Image, Video, and Multidimensional Signal Processing

41 Views

Image, Video, and Multidimensional Signal Processing

Pages