Sorry, you need to enable JavaScript to visit this website.

The image enhancement is one of the most important image processing techniques and is used to improve the quality of the image captured in various situations. In this study, we propose a novel hue preserving contrast enhancement method that realizes the chroma adjustment while suppressing overenhancement. In the proposed method, firstly, the histogram of each RGB component of an original image is smoothed by a Gaussian filter. Then, the histogram specification method is performed using the smoothed histogram of each RGB component to spread the pixel distribution in RGB color space.

Categories:
6 Views

Human-Object Interaction (HOI) detection, which aims to identify humans and objects with interactive behaviors in images and predict the behaviors between them, is of great significance for semantic understanding. The existing works primarily focus on exploring the fine-grained semantic features of humans and objects, as well as the spatial relationships between them. However, these methods do not leverage the contextual information within the interaction area, which could potentially be valuable for predicting interaction behavior.

Categories:
6 Views

In recent years, Multi-Camera Multiple Object Tracking (MCMT) has gained significant attention as a crucial computer vision application. Research focuses on data association and track detection. However, accurately selecting datasets from raw vision data remains challenging due to real-world complexities like object types, varying speeds, and unknown directions. To address these problems, this paper proposes the Object Tracking Model (OTM) to capture the feature of target area with the Camera Monitoring Network (CMN) based on Graph Convolutional Network (GCN).

Categories:
17 Views

We propose Gumbel-NeRF, a mixture-of-expert (MoE) neural radiance fields (NeRF) model with a hindsight expert selection mechanism for synthesizing novel views of unseen objects. Previous studies have shown that the MoE structure provides high-quality representations of a given large-scale scene consisting of many objects. However, we observe that such a MoE NeRF model often produces low-quality representations in the vicinity of experts’ boundaries when applied to the task of novel view synthesis of an unseen object from one/few-shot input.

Categories:
3 Views

The introduction of diverse text-to-image generation models has sparked significant interest across various sectors. While these models provide the groundbreaking capability to convert textual descriptions into visual data, their widespread usage has ignited concerns over misusing realistic synthesized images. Despite the pressing need, research on detecting such synthetic images remains limited. This paper aims to bridge this gap by evaluating the ability of several existing detectors to detect synthesized images produced by text-to-image generation models.

Categories:
6 Views

Emergency response missions depend on the fast relay of visual information, a task to which unmanned aerial vehicles are well adapted. However, the effective use of unmanned aerial vehicles is often compromised by bandwidth limitations that impede fast data transmission, thereby delaying the quick decision-making necessary in emergency situations. To address these challenges, this paper presents a streamlined hybrid annotation framework that utilizes the JPEG 2000 compression algorithm to facilitate object detection under limited bandwidth.

Categories:
7 Views

We revisit language bottleneck models as an approach to ensuring the explainability of deep learning models for image classification. Because of inevitable information loss incurred in the step of converting images into language, the accuracy of language bottleneck models is considered to be inferior to that of standard black-box models. Recent image captioners based on large-scale foundation models of Vision and Language, however, have the ability to accurately describe images in verbal detail to a degree that was previously believed to not be realistically possible.

Categories:
138 Views

Face super-resolution (FSR) is a powerful technique for restoring high-resolution face images from the captured low-resolution ones with the assistance of prior information. Existing FSR methods based on explicit or implicit covariance matrices are difficult to reveal complex nonlinear relationships between features, as conventional covariance computation is essentially a linear operation process. Besides, the limited number of training samples and noise disturbance lead to the deviation of sample covariance matrices.

Categories:
47 Views

Counting repetitive actions in long untrimmed videos is a challenging task that has many applications such as rehabilitation.

Categories:
117 Views

Pages