- Read more about ENACT: Entropy-based Clustering of Attention Input for Reducing the Computational Resources of Object Detection Transformers - Supplementary Material
- Log in to post comments
Transformers demonstrate competitive performance in terms of precision on the problem of vision-based object detection. However, they require considerable computational resources due to the quadratic size of the attention weights.
- Categories:
- Read more about In2Out: Fine-Tuning Video Inpainting Model for Video Outpainting using Hierarchical Discriminator
- Log in to post comments
Video outpainting presents a unique challenge of extending the borders while maintaining consistency with the given content. In this paper, we suggest the use of video inpainting models that excel in object flow learning and reconstruction in outpainting rather than solely generating the background as in existing methods. However, directly applying or fine-tuning inpainting models to outpainting has shown to be ineffective, often leading to blurry results.
- Categories:
- Read more about Supplementary FMG-Det: Foundation Model Guided Robust Object Detection
- Log in to post comments
Collecting high quality data for object detection tasks is challenging due to the inherent subjectivity in labeling the boundaries of an object. This makes it difficult to not only collect consistent annotations across a dataset but also to validate them, as no two annotators are likely to label the same object using the exact same coordinates. These challenges are further compounded when object boundaries are partially visible or blurred, which can be the case in many domains.
- Categories:
- Read more about HFSVQ - Supplementary Materials
- Log in to post comments
This document serves as the 'supplementary materials' for HFSVQ, which has been submitted to ICIP 2025.
- Categories:
- Read more about ICIP2025_Supplementary_ESCANet
- Log in to post comments
While deep learning based solutions, including CNNs or transformer-based architectures, have demonstrated promising results for image super-resolution (SR) tasks, their substantial depth and parameters challenge deployment on edge computing AI-enabled devices. To address this issue, we propose a lightweight single image super-resolution (SISR) model named Efficient Spatial and Channel Attentive Network (ESCANet), comprised of Spatial Enhancement Module (SEM) and Channel-wise Enhancement Module (CEM).
- Categories:
Previous works on image inpainting mainly focus on inpainting background or partially missing objects, while the problem of inpainting an entire missing object remains unexplored.
This work studies a new image inpainting problem,~\ie shape-guided object inpainting. Given an incomplete input image, the goal is to fill in the hole by generating an object based on the context and the implicit guidance provided by the hole shape.
We propose a new data preparation method and a novel Contextual Object Generator for the object inpainting task.
- Categories:
- Read more about ICIP2025_3D_360ExtremelySparseViews
- Log in to post comments
Novel view synthesis in 360$^\circ$ scenes from extremely sparse input views is essential for applications like virtual reality and augmented reality. This paper presents a novel framework for novel view synthesis in extremely sparse-view cases. As typical structure-from-motion methods are unable to estimate camera poses in extremely sparse-view cases, we apply DUSt3R to estimate camera poses and generate a dense point cloud.
- Categories:
- Read more about Adaptive Adversarial Cross-Entropy Loss for Sharpness-Aware Minimization
- Log in to post comments
Recent advancements in learning algorithms have demonstrated that the sharpness of the loss surface is an effective measure for improving the generalization gap. Building upon this concept, Sharpness-Aware Minimization (SAM) was proposed to enhance model generalization and achieved state-of-the-art performance. SAM consists of two main steps, the weight perturbation step and the weight updating step. However, the perturbation in SAM is determined by only the gradient of the training loss, or cross-entropy loss.
- Categories:
- Read more about Rethinking temporal self-similarity for repetitive action counting
- Log in to post comments
Counting repetitive actions in long untrimmed videos is a challenging task that has many applications such as rehabilitation. State-of-the-art methods predict action counts by first generating a temporal self-similarity matrix (TSM) from the sampled frames and then feeding the matrix to a predictor network. The self-similarity matrix, however, is not an optimal input to a network since it discards too much information from the frame-wise embeddings.
- Categories:
- Read more about Fast Unsupervised Tensor Restoration via Low-rank Deconvolution
- Log in to post comments
Low-rank Deconvolution (LRD) has appeared as a new multi-dimensional representation model that enjoys important efficiency and flexibility properties. In this work we ask ourselves if this analytical model can compete against Deep Learning (DL) frameworks like Deep Image Prior (DIP) or Blind-Spot Networks (BSN) and other classical methods in the task of signal restoration. More specifically, we propose to extend LRD with differential regularization.
- Categories: