Sorry, you need to enable JavaScript to visit this website.

We propose Gumbel-NeRF, a mixture-of-expert (MoE) neural radiance fields (NeRF) model with a hindsight expert selection mechanism for synthesizing novel views of unseen objects. Previous studies have shown that the MoE structure provides high-quality representations of a given large-scale scene consisting of many objects. However, we observe that such a MoE NeRF model often produces low-quality representations in the vicinity of experts’ boundaries when applied to the task of novel view synthesis of an unseen object from one/few-shot input.

Categories:
1 Views

The introduction of diverse text-to-image generation models has sparked significant interest across various sectors. While these models provide the groundbreaking capability to convert textual descriptions into visual data, their widespread usage has ignited concerns over misusing realistic synthesized images. Despite the pressing need, research on detecting such synthetic images remains limited. This paper aims to bridge this gap by evaluating the ability of several existing detectors to detect synthesized images produced by text-to-image generation models.

Categories:
1 Views

Emergency response missions depend on the fast relay of visual information, a task to which unmanned aerial vehicles are well adapted. However, the effective use of unmanned aerial vehicles is often compromised by bandwidth limitations that impede fast data transmission, thereby delaying the quick decision-making necessary in emergency situations. To address these challenges, this paper presents a streamlined hybrid annotation framework that utilizes the JPEG 2000 compression algorithm to facilitate object detection under limited bandwidth.

Categories:
5 Views

We revisit language bottleneck models as an approach to ensuring the explainability of deep learning models for image classification. Because of inevitable information loss incurred in the step of converting images into language, the accuracy of language bottleneck models is considered to be inferior to that of standard black-box models. Recent image captioners based on large-scale foundation models of Vision and Language, however, have the ability to accurately describe images in verbal detail to a degree that was previously believed to not be realistically possible.

Categories:
123 Views

Face super-resolution (FSR) is a powerful technique for restoring high-resolution face images from the captured low-resolution ones with the assistance of prior information. Existing FSR methods based on explicit or implicit covariance matrices are difficult to reveal complex nonlinear relationships between features, as conventional covariance computation is essentially a linear operation process. Besides, the limited number of training samples and noise disturbance lead to the deviation of sample covariance matrices.

Categories:
47 Views

Counting repetitive actions in long untrimmed videos is a challenging task that has many applications such as rehabilitation.

Categories:
112 Views

Transformer has achieved remarkable success in low-level visual tasks, including image super-resolution (SR), owing to its ability to establish global dependencies through self-attention mechanism. However, existing methods overlook the mutual influence and promotion between the channel and spatial dimensions. The feed-forward network (FFN) in the transformer architecture introduces redundant information in the channel during feature extraction, hindering feature representation capability and neglecting spatial information modeling.

Categories:
27 Views

This paper addresses the problem of camera calibration and shape recovery using a single image of a reflectively symmetric object. Unlike existing methods requiring knowledge of 3D points or two images, this paper proposes to calibrate camera parameters using one image with known point distance ratios on 3D object. Specifically, we first recover the vanishing point of the symmetry plane normal. Then a set of candidate focal lengths are uniformly selected as the initial values, from which the pan and yaw angles of the camera can be obtained.

Categories:
29 Views

Attention mechanisms are widely adopted in existing scene parsing methods due to their excellent performance, especially spatial self-attention. However, spatial self-attention suffers from high computational complexity, which limits the practical applications of the scene parsing methods on mobile devices with limited resources. In view of this, we propose a simple yet effective spatial attention module, namely Content-Aware Attention Module (CAAM).

Categories:
14 Views

Pages