ICIP 2019

The International Conference on Image Processing (ICIP), sponsored by the IEEE Signal Processing Society, is the premier forum for the presentation of technological advances and research results in the fields of theoretical, experimental, and applied image and video processing. ICIP has been held annually since 1994, brings together leading engineers and scientists in image and video processing from around the world. Visit website.

A Collaborative Algorithmic Framework to Track Objects And Events

Read more about A Collaborative Algorithmic Framework to Track Objects And Events
Log in to post comments

One faces several challenges when tracking objects and events simultaneously in multi-camera environments — especially if the events associated with the object require precise knowledge of the pose of the object at each instant of time. To illustrate the challenges involved, we consider the problem of tracking bins and their contents at airport security checkpoints. The pose of each bin must be tracked with precision in order to minimize the errors associated with the detection of the various items that the passengers may place in the bins and/or take out of them.

icip_poster_ID2701.pdf

Object tracking at airport security checkpoint (291)

Categories:: Image/Video Processing

9 Views

Adaptive Fusion-based 3D Keypoint Detection for RGB Point Clouds

Read more about Adaptive Fusion-based 3D Keypoint Detection for RGB Point Clouds
Log in to post comments

We propose a novel keypoint detector for 3D RGB Point Clouds (PCs). The proposed keypoint detector exploits both the 3D structure and the RGB information of the PC data. Keypoint candidates are generated by computing the eigenvalues of the covariance matrix of the PC structure information. Additionally, from the RGB information, we estimate the salient points by an efficient adaptive difference of Gaussian-based operator. Finally, we fuse the resulting two sets of salient points to improve the repeatability of the 3D keypoint detector.

ICIP_MZIqbal_Adaptive_Fusion Final No Animation Print.pdf

ICIP_MZIqbal_Adaptive_Fusion Final No Animation Print.pdf (421)

Categories:: Image/Video Processing

39 Views

Efficient Codebook and Factorization for Second-Order Representation Learning

Read more about Efficient Codebook and Factorization for Second-Order Representation Learning
Log in to post comments

ICIP 2019 - JCF(3).pdf

ICIP 2019 - JCF(3).pdf (390)

Categories:: Image/Video Storage, Retrieval

7 Views

IMPRESSION ESTIMATION FOR DEFORMED PORTRAITS WITH A LANDMARK-BASED RANKING NETWORK

Read more about IMPRESSION ESTIMATION FOR DEFORMED PORTRAITS WITH A LANDMARK-BASED RANKING NETWORK
Log in to post comments

In recent years, it has become a trend for people to manipulate their own portraits before posting them on a social networking service. However, it is difficult to get a desired portrait after manipulation without sufficient experience or skill. To obtain a simpler and more effective portrait manipulation technique, we consider an automated portrait manipulation method based on five impression words: clear, sweet, elegant, modern, and dynamic.

2019ICIPposter_miyata_verfinal3.pdf

2019ICIPposter_miyata_verfinal3.pdf (301)

Categories:: Image, Video, and Multidimensional Signal Processing

49 Views

BODYFITR: Robust automatic 3D human body fitting

Read more about BODYFITR: Robust automatic 3D human body fitting
Log in to post comments

This paper proposes BODYFITR, a fully automatic method to fit a human body model to static 3D scans with complex poses. Automatic and reliable 3D human body fitting is necessary for many applications related to healthcare, digital ergonomics, avatar creation and security, especially in industrial contexts for large-scale product design. Existing works either make prior assumptions on the pose, require manual annotation of the data or have difficulty handling complex poses.

bodyfitr_poster_icip19-final.pdf

bodyfitr poster (469)

Categories:: Virtual reality and 3D imaging
Other applications of machine learning (MLR-APPL)

127 Views

END-TO-END PERSON SEARCH SEQUENTIALLY TRAINED ON AGGREGATED DATASET

Read more about END-TO-END PERSON SEARCH SEQUENTIALLY TRAINED ON AGGREGATED DATASET
Log in to post comments

In video surveillance applications, person search is a chal-
lenging task consisting in detecting people and extracting
features from their silhouette for re-identification (re-ID) pur-
pose. We propose a new end-to-end model that jointly com-
putes detection and feature extraction steps through a single
deep Convolutional Neural Network architecture. Sharing
feature maps between the two tasks for jointly describing
people commonalities and specificities allows faster runtime,
which is valuable in real-world applications. In addition

2019_icip_aloesch_v2.pdf

2019_ICIP_aloesch (417)

Categories:: Applications

11 Views

PHOTO STYLE TRANSFER WITH CONSISTENCY LOSSES

Read more about PHOTO STYLE TRANSFER WITH CONSISTENCY LOSSES
Log in to post comments

We address the problem of style transfer between two photos and propose a new way to preserve photorealism. Using the single pair of photos available as input, we train a pair of deep convolution networks (convnets), each of which transfers the style of one photo to the other. To enforce photorealism, we introduce a content preserving mechanism by combining a cycle-consistency loss with a self-consistency loss. Experimental results show that this method does not suffer from typical artifacts observed in methods working in the same settings.

PHOTO STYLE TRANSFER WITH CONSISTENCY LOSSES.pdf

PHOTO STYLE TRANSFER WITH CONSISTENCY LOSSES.pdf (578)

Categories:: Image/Video Processing

68 Views

Toward Visual Voice Activity Detection for Unconstrained Videos

Read more about Toward Visual Voice Activity Detection for Unconstrained Videos
Log in to post comments

The prevalent audio-based Voice Activity Detection (VAD) systems are challenged by the presence of ambient noise and are sensitive to variations in the type of the noise. The use of information from the visual modality, when available, can help overcome some of the problems of audio-based VAD. Existing visual-VAD systems however do not operate directly on the whole image but require intermediate face detection, face landmark detection and subsequent facial feature extraction from the lip region.

ICIP Poster_SHARMA.pdf

Poster Presentation (328)

Categories:: Image/Video Processing

30 Views

LANGUAGE AND VISUAL RELATIONS ENCODING FOR VISUAL QUESTION ANSWERING

Read more about LANGUAGE AND VISUAL RELATIONS ENCODING FOR VISUAL QUESTION ANSWERING
Log in to post comments

Visual Question Answering (VQA) involves complex relations of two modalities, including the relations between words and between image regions. Thus, encoding these relations is important to accurate VQA. In this paper, we propose two modules to encode the two types of relations respectively. The language relation encoding module is proposed to encode multi-scale relations between words via a novel masked selfattention. The visual relation encoding module is proposed to encode the relations between image regions.

poster.pdf

poster.pdf (278)

Categories:: Image/Video Processing

18 Views

Image Super-Resolution using CNN Optimised by Self-Feature Loss

Read more about Image Super-Resolution using CNN Optimised by Self-Feature Loss
Log in to post comments

Despite the success of state-of-the-art single image superresolution algorithms using deep convolutional neural networks in terms of both reconstruction accuracy and speed of execution, most proposed models rely on minimizing the mean square reconstruction error. More recently, inspired by transfer learning, Mean Square Error (MSE)-based content loss estimation has been replaced with loss calculated on feature maps of the pre-trained networks, e.g. VGG-net used for ImageNet classification.

Gao, Z - Super Resolution.pdf

Gao, Z - Super Resolution.pdf (357)

Categories:: Image/Video Processing

29 Views

Pages