Image/Video Processing

Estimation of gaze region using two dimensional probabilistic maps constructed using convolutional neural networks

Predicting the gaze of a user can have important applications in hu- man computer interactions (HCI). They find applications in areas such as social interaction, driver distraction, human robot interaction and education. Appearance based models for gaze estimation have significantly improved due to recent advances in convolutional neural network (CNN). This paper proposes a method to predict the gaze of a user with deep models purely based on CNNs.

Jha_2019-poster.pdf

Jha_2019-poster.pdf (345)

Categories:: Image/Video Processing

34 Views

A REAL-TIME DEEP NETWORK FOR CROWD COUNTING

Read more about A REAL-TIME DEEP NETWORK FOR CROWD COUNTING
Log in to post comments

Automatic analysis of highly crowded people has attracted extensive attention from computer vision research. Previous approaches for crowd counting have already achieved promising performance across various benchmarks. However, to deal with the real situation, we hope the model run as fast as possible while keeping accuracy. In this paper, we propose a compact convolutional neural network for crowd counting which learns a more efficient model with a small number of parameters.

ICASSP_2020_XShi_PPT.pdf

ICASSP_2020_XShi_PPT.pdf (234)

Categories:: Image/Video Processing

16 Views

View-angle Invariant Object Monitoring Without Image Registration

Read more about View-angle Invariant Object Monitoring Without Image Registration
Log in to post comments

Object monitoring can be performed by change detection algorithms. However, for the image pair with a large perspective difference, the change detection performance is usually impacted by inaccurate image registration. To address the above difficulties, a novel object-specific change detection approach is proposed for object monitoring in this paper. In contrast to traditional approaches, the proposed approach is robust to view angle variation and does not require explicit image registration. Experiments demonstrate the effectiveness and advantages of the proposed approach.

View-angle Invariant Object Monitoring Without Image Registration .pdf

View-angle Invariant Object Monitoring Without Image Registration .pdf (264)

Categories:: Image/Video Processing

21 Views

BILATERAL RECURRENT NETWORK FOR SINGLE IMAGE DERAINING

Read more about BILATERAL RECURRENT NETWORK FOR SINGLE IMAGE DERAINING
Log in to post comments

Single image deraining has been widely studied in recent years. Motivated by residual learning, most deep learning based deraining approaches devote research attention to extracting rain streaks, usually yielding visual artifacts in final deraining images. To address this issue, we in this paper propose bilateral recurrent network (BRN) to simultaneously exploit rain streak layer and background image layer. Generally, we employ dual residual networks (ResNet) that are recursively unfolded to sequentially extract rain streaks and predict clean background image.

BRN_slides.pdf

BRN_slides.pdf (288)

Categories:: Image/Video Processing

33 Views

JOINT ENHANCEMENT AND DENOISING OF LOW LIGHT IMAGES VIA JND TRANSFORM

Read more about JOINT ENHANCEMENT AND DENOISING OF LOW LIGHT IMAGES VIA JND TRANSFORM
Log in to post comments

Low light images suffer from low dynamic range and severe noise due to low signal-to-noise ratio (SNR). In this paper, we propose joint enhancement and denoising of low light images via justnoticeable-difference (JND) transform. We achieve contrast enhancement and noise reduction simultaneously based on human visual perception. First, we perform contrast enhancement based on perceptual histogram to effectively allocate a dynamic range while preventing over-enhancement. Second, we generate JND map based on an HVS response model from foreground and background luminance, called JND transform.

ICASSP_2020.ppt

ICASSP_2020.ppt (226)

Categories:: Image/Video Processing

57 Views

Look globally, age locally: Face aging with an attention mechanism

Read more about Look globally, age locally: Face aging with an attention mechanism
1 comment
Log in to post comments

Face aging is of great importance for cross-age recognition and entertainment-related applications. Recently, conditional generative adversarial networks (cGANs) have achieved impressive results for facial aging. Existing cGANs-based methods usually require a pixel-wise loss to keep the identity and background consistent. However, minimizing the pixel-wise loss between the input and synthesized images likely resulting in a ghosted or blurry face.

slide_paper5430.pdf

slide_paper5430.pdf (203)

Categories:: Image/Video Processing

18 Views

Graph Neural Net using Analytical Graph Filters and Topology Optimization for Image Denoising

While convolutional neural nets (CNN) have achieved remarkable performance for a wide range of inverse imaging applications, the filter coefficients are computed in a purely data-driven manner and are not explainable. Inspired by an analytically derived CNN by Hadji et al., in this paper we construct a new layered graph convolutional neural net (GCNN) using GraphBio as our graph filter.

icassp2020_v2.pdf

Icassp_DeepAGF_2020_slides (243)

Categories:: Image/Video Processing

74 Views

Mr Nikolajs Skuratovs

Read more about Mr Nikolajs Skuratovs
Log in to post comments

In this paper we consider the problem of recovering a signal x of size N from noisy and compressed measurements y = A x + w of size M, where the measurement matrix A is right-orthogonally invariant (ROI). Vector Approximate Message Passing (VAMP) demonstrates great reconstruction results for even highly ill-conditioned matrices A in relatively few iterations. However, performing each iteration is challenging due to either computational or memory point of view.

ICASSP_presentation_v2_pdf.pdf

ICASSP_presentation_v2_pdf.pdf (262)

Categories:: Image/Video Processing

29 Views

Video-Driven Speech Reconstruction - Show & Tell Demo

Read more about Video-Driven Speech Reconstruction - Show & Tell Demo
Log in to post comments

This demo will showcase our video-to-audio model which attempts to reconstruct speech from short videos of spoken statements. Our model does so in a completely end-to-end manner where raw audio is generated based on the input video. This approach bypasses the need for separate lip-reading and text-to-speech models. The advantage of such an approach is that it does not require large transcribed datasets and it is not based on intermediate representations like text which remove any intonation and emotional content from the speech.

Video-driven Speech Reconstruction using Generative Adversarial Networks Show & Tell Demo.pdf

Video-driven Speech Reconstruction using Generative Adversarial Networks Show & Tell Demo.pdf (455)

Categories:: Image/Video Processing
Speech Retrieval (SLP-IR)

101 Views

Multispectral Fusion of RGB and NIR Images Using Weighted Least Squares and Alternating Guidance

In low light condition, color (RGB) images captured by camera contain much noise and loss of details and color. However, near infrared (NIR) images are robust to noise and have clear textures without color. In this paper, we propose multi-spectral fusion of RGB and NIR images using weighted least squares (WLS) and alternating guidance. Low light RGB images provide coarse image structure and color, while NIR images offer clear textures in a short distance. Since they are complementary, we adopt alternating guidance for fusion of RGB and NIR images based on WLS.

ICASSP2020_Multispectral Fusion of RGB and NIR Images Using Weighted Least Squares and Alternating Guidance.pdf

ICASSP2020_Multispectral Fusion of RGB and NIR Images Using Weighted Least Squares and Alternating Guidance.pdf (233)

Categories:: Image/Video Processing

49 Views

Image/Video Processing

Pages