Sorry, you need to enable JavaScript to visit this website.

The area of Video Camouflaged Object Detection (VCOD) presents unique challenges in the field of computer vision due to texture similarities between target objects and their surroundings, as well as irregular motion patterns caused by both objects and camera movement. In this paper, we introduce TokenMotion (TMNet), which employs a transformer-based model to enhance VCOD by extracting motion-guided features using a learnable token selection. Evaluated on the challenging MoCA-Mask dataset, TMNet achieves state-of-the-art performance in VCOD.

Categories:
18 Views

Neural Radiance Fields (NeRF) have revolutionized 3D scene modeling and rendering. However, their performance dips when handling images with diverse exposure levels, mainly due to the intricate luminance dynamics. Addressing this, we present an innovative method that proficiently models and renders images across a spectrum of exposure conditions. Our approach utilizes an unsupervised classifier-generator structure for HDR fusion, significantly enhancing NeRF's ability to comprehend and adjust to light variations, leading to the generation of images with appropriate brightness.

Categories:
45 Views

High Dynamic Range (HDR) imaging seeks to enhance image quality by combining multiple Low Dynamic Range (LDR) images captured at varying exposure levels. Traditional deep learning approaches often employ reconstruction loss, but this method can lead to ambiguities in feature space during training. To address this issue, we present a new loss function, termed Gravitated Latent Space (GLS) loss, that leverages a metric tensor to introduce a form of virtual gravity within the latent space. This feature helps the model in overcoming saddle points more effectively.

Categories:
40 Views

Panoptic Scene Graph Generation (PSG) involves the detection of objects and the prediction of their corresponding relationships (predicates). However, the presence of biased predicate annotations poses a significant challenge for PSG models, as it hinders their ability to establish a clear decision boundary among different predicates. This issue substantially impedes the practical utility and real-world applicability of PSG models.

Categories:
37 Views

3D object detection plays a crucial role in intelligent vision systems. Detection in the open world inevitably encounters various adverse scenes while most of existing methods fail in these scenes. To address this issue, this paper proposes a monocular 3D detection model, termed AEAM3D, which effectively mitigates the degradation of detection performance in various harsh environments. Additionally, we assemble a new adverse 3D object detection dataset encompassing some challenging scenes, including rainy, foggy, and low light

Categories:
76 Views

Language-guided video summarization empowers users to use natural language queries to effortlessly summarize lengthy videos into concise and relevant summaries that cater specifically to their information needs, which is more friendly to access and digest. However, most of the previous works rely on tremendous (also expensive) annotated videos and complex designs to align different modals at the feature level.

Categories:
133 Views

We present a generative model that learns to synthesize human motion from limited training sequences. In contrast to existing methods, our framework provides stylistic control across multiple temporal resolutions. The model adeptly captures human motion patterns by integrating skeletal convolution layers and a multi-scale architecture. Our framework contains a set generative and adversarial networks, along with style embedding modules, each tailored for generating motions at specific frame rates while exerting control over their style.

Categories:
19 Views

This is the supplementary materials for BMT-BENCH dataset for video generation. The material submission includes the links to the dataset and the baseline system

Categories:
10 Views

The LIVE-Viasat Real-World Satellite QoE Database is an innovative and comprehensive resource designed to address the critical challenges faced by Internet Service Providers (ISPs), particularly in the domain of satellite streaming services.

Categories:
9 Views

To evaluate the generalization of referring image segmentation (RIS) in the context of human-robot interaction, we generate referring expressions for a subset of images from GraspNet using Shikra.

Categories:
14 Views

Pages