Sorry, you need to enable JavaScript to visit this website.

ICIP 2021 - The International Conference on Image Processing (ICIP), sponsored by the IEEE Signal Processing Society, is the premier forum for the presentation of technological advances and research results in the fields of theoretical, experimental, and applied image and video processing. ICIP has been held annually since 1994, brings together leading engineers and scientists in image and video processing from around the world. Visit website.

Several computer vision applications such as person search or online fashion rely on human description. The use of instance-level human parsing (HP) is therefore relevant since it localizes semantic attributes and body parts within a person. But how to characterize these attributes? To our knowledge, only some single-HP datasets describe attributes with some color, size and/or pattern characteristics. There is a lack of dataset for multi-HP in the wild with such characteristics.

Categories:
2 Views

In this paper, we propose a framework for 3D human pose estimation with a single 360° camera mounted on the user's wrist. Perceiving a 3D human pose with such a simple setting has remarkable potential for various applications (e.g., daily-living activity monitoring, motion analysis for sports enhancement). However, no existing work has tackled this task due to the difficulty of estimating a human pose from a single camera image in which only a part of the human body is captured and the lack of training data.

Categories:
16 Views

Over the past decade, nonlinear image compression techniques based on neural networks have been rapidly developed to achieve more efficient storage and transmission of images compared with conventional linear techniques. A typical non-linear technique is implemented as a neural network trained on a vast set of images, and the latent representation of a target image is transmitted. In contrast to the previous nonlinear techniques, we propose a new image compression method in which a neural network model is trained exclusively on a single target image, rather than a set of images.

Categories:
4 Views

Optical Coherence Tomography (OCT) is a powerful technique for non-invasive 3D imaging of biological tissues at high resolution that has revolutionized retinal imaging. A major challenge in OCT imaging is the motion artifacts introduced by involuntary eye movements. In this paper, we propose a convolutional neural network that learns to correct axial motion in OCT based on a single volumetric scan. The proposed method is able to correct large motion, while preserving the overall curvature of the retina.

Categories:
1 Views

High dynamic range (HDR) image formation from low dynamic range (LDR) images of different exposures is a well researched topic in the past two decades.
However, most of the developed techniques consider differently exposed LDR images that are acquired from the same camera view point, which assumes the scene to be static long enough to capture multiple images.
In this paper, we propose to address the problem of HDR imaging from differently exposed LDR stereo images using an encoder-decoder based convolutional neural network (CNN).

Categories:
5 Views

A novel method based on time-lag aware multi-modal variational autoencoder for prediction of important scenes (Tl-MVAE-PIS) using baseball videos and tweets posted on Twitter is presented in this paper. This paper has the following two technical contributions. First, to effectively use heterogeneous data for the prediction of important scenes, we transform textual, visual and audio features obtained from tweets and videos to the latent features. Then Tl-MVAE-PIS can flexibly express the relationships between them in the constructed latent space.

Categories:
5 Views

A novel method based on time-lag aware multi-modal variational autoencoder for prediction of important scenes (Tl-MVAE-PIS) using baseball videos and tweets posted on Twitter is presented in this paper. This paper has the following two technical contributions. First, to effectively use heterogeneous data for the prediction of important scenes, we transform textual, visual and audio features obtained from tweets and videos to the latent features. Then Tl-MVAE-PIS can flexibly express the relationships between them in the constructed latent space.

Categories:
1 Views

Pages