Sorry, you need to enable JavaScript to visit this website.

ICIP 2021 - The International Conference on Image Processing (ICIP), sponsored by the IEEE Signal Processing Society, is the premier forum for the presentation of technological advances and research results in the fields of theoretical, experimental, and applied image and video processing. ICIP has been held annually since 1994, brings together leading engineers and scientists in image and video processing from around the world. Visit website.

Unsupervised learning of disentangled representations is a core task for discovering interpretable factors of variation in an image dataset. We propose a novel method that can learn disentangled representations with semantic explanations on natural image datasets. In our method, we guide the representation learning of a variational autoencoder (VAE) via reconstruction in a visual-semantic embedding (VSE) space to leverage the semantic information of image data and explain the learned latent representations in an unsupervised manner.


Various protocols have been developed to improve the success rate of In Vitro Fertilization (IVF). Earlier protocols were based on embryonic cell quality on embryos' third day. Newer protocols rely on the blastocyst quality (day-5 embryo).
Artificial intelligence (AI) systems for automatic human embryo quality assessment seem to be the natural trend towards improving IVF's outcome. AI systems can potentially reveal hidden relationships between embryos' various attributes. To this date, most AI systems assess single blastocyst images.


We address 3D human pose and shape estimations from multi-view images. We use the SMPL body model, and regress the model parameters that best fit the shape and pose. To solve for the parameters, we first compute 3D joint positions from 2D joint estimations on images by using a linear algebraic triangulation. Then, we fit the 3D parametric body model to the 3D joints while imposing a bone orientation constraint between the 3D model and the corresponding body parts detected in the images.


This work explores facial expression bias as a security vulnerability of face recognition systems. Despite the great performance achieved by state-of-the-art face recognition systems, the algorithms are still sensitive to a large range of covariates. We present a comprehensive analysis of how facial expression bias impacts the performance of face recognition technologies. Our study analyzes: i) facial expression biases in the most popular face recognition databases; and ii) the impact of facial expression in face recognition performances.


A learning algorithm referred to as Maximum Margin (MM) is proposed for considering the class-imbalance data learning issue: the deep model tends to predict the majority classes rather than the minority ones. For better generalization on the minority classes, the proposed Maximum Margin (MM) loss function is newly designed by minimizing a margin-based generalization bound through the shifting decision bound. As a prior study, the theoretically principled label-distributionaware margin (LDAM) loss had been successfully applied with classical strategies such as re-weighting or re-sampling.


The recent trend in regularization methods for inverse problems is to replace handcrafted sparsifying operators with data-driven approaches. Although using such machine learning techniques often improves image reconstruction methods, the results can depend significantly on the learning methodology. This paper compares two supervised learning methods. First, the paper considers a transform learning approach and, to learn the transform, introduces a variant on the Procrustes method for wide matrices with orthogonal rows. Second, we consider a bilevel convolutional filter learning approach.


In this paper we address the problem of jointly retrieving a 3D dynamic shape, camera motion, and deformation grouping from partial 2D point trajectories in a monocular video. To this end, we introduce a union of piecewise Bézier subspaces with enforcing continuities to model 3D motion. We show that formulating the problem in terms of piecewise curves, allows for a better physical interpretation of the resulting priors and a more accurate representation of the motion.