Sorry, you need to enable JavaScript to visit this website.

ICIP 2021 - The International Conference on Image Processing (ICIP), sponsored by the IEEE Signal Processing Society, is the premier forum for the presentation of technological advances and research results in the fields of theoretical, experimental, and applied image and video processing. ICIP has been held annually since 1994, brings together leading engineers and scientists in image and video processing from around the world. Visit website.

Binary shapes, or silhouettes, are essential in human communication. They include, for example, all fonts and many logos. They can be extracted from images in raster form but require a vectorization for resolution independent editing. In this paper, we propose a mathematically founded silhouette vectorization algorithm, which converts a raster 2D shape to a Scalable Vector Graphics (SVG) format whose control points are geometrically stable under affine transformations. The proposed method can also be used as a reliable feature point detector for silhouettes.


Depression is a common mental disorder that affects patients’ daily life. Most existing depression detection methods consume a lot of medical resources and exist at risk of subjective judgment. Therefore, we propose an objective and convenient experimental paradigm. Firstly, it selects emotional images as stimuli and records the subjects’ eye movement data. Secondly, we establish a connection between image processing and subjects’ psychological conditions analysis.


A few-shot personalized saliency prediction method using person similarity based on collaborative multi-output Gaussian process regression is presented in this paper. Contrary to prediction of general saliency maps, that of personalized saliency maps (PSMs), which is a focus of attention owing to its heterogeneity among individuals, is a challenging problem since the amount of training gaze data is limited due to the burden on new persons. Thus, the proposed method focuses on the similarity of gaze tendency between persons.


This paper presents a correlation-aware attention branch network (CorABN) using multi-modal data for deterioration level estimation of infrastructures. CorABN can collaboratively use visual features from distress images and text features from text data recorded at the inspection to improve the estimation accuracy of deterioration levels. Specifically, by maximizing correlation between the visual and text features that provide useful information for the deterioration level estimation, a correlation-aware attention map can be generated.


Fine-grained human action recognition is a core research topic in computer vision. Inspired by the recently proposed hierarchy representation of fine-grained actions in FineGym and SlowFast network for action recognition, we propose a novel multi-task network which exploits the FineGym hierarchy representation to achieve effective joint learning and prediction for fine-grained human action recognition.


Compression of the sign information of discrete cosine transform coefficients is an intractable problem in image compression schemes due to the equiprobable occurrence of the sign bits. To overcome this difficulty, we propose an efficient compression method for such sign information based on phase retrieval, which is a classical signal restoration problem attempting to find the phase information of discrete Fourier transform coefficients from their magnitudes.


Deep-learning-based image inpainting algorithms have shown great performance via powerful learned priors from numerous external natural images. However, they show unpleasant results for test images whose distributions are far from those of the training images because their models are biased toward the training images. In this paper, we propose a simple image inpainting algorithm with test-time adaptation named AdaFill. Given a single out-of-distributed test image, our goal is to complete hole region more naturally than the pre-trained inpainting models.


Generative Adversarial Networks (GANs) have been used recently for anomaly detection from images, where the anomaly scores are obtained by comparing the global difference between the input and generated image. However, the anomalies often appear in local areas of an image scene, and ignoring such information can lead to unreliable detection of anomalies.