Sorry, you need to enable JavaScript to visit this website.

Supplemental Material

Categories:
9 Views

This document contains the supplementary material for the ICIP_2050 paper with ID #2494 and title "An End-to-End Class-Aware and Attention-Guided Model for Object State Classification".

Categories:
53 Views

Supplementary materials for paper Pose-free 3D Gaussian Splatting via Shape-Ray Estimation

Categories:
8 Views

The emergence of text-to-image diffusion models marked a revolutionary breakthrough in generative AI. However, training a text-to-image model to consistently reproduce the same subject remains a challenging task. Existing methods often require costly setups, lengthy fine-tuning processes and struggle to generate diverse, text-aligned images. Moreover, the increasing size of diffusion models over the years highlights a scalability challenge for previous fine-tuning methods, as tuning on these large models is even more costly. To address these limitations, we present Stencil.

Categories:
44 Views

In the last few years, vision transformers have increasingly been adopted for medical image classification and other applications due to their improved accuracies compared to other deep learning models. However, due to their size and complex interactions via the self-attention mechanism, they are not well understood. In particular, it is unclear whether the representations produced by such models are semantically meaningful.

Categories:
3 Views

Fine-grained action localization in untrimmed sports videos is a challenging task, as motion transitions are subtle and occur within short time spans. Traditional supervised and weakly supervised methods require extensive labeled data, making them less scalable and generalizable. To address these challenges, we propose an unsupervised skeleton-based action localization pipeline that detects fine-grained action boundaries using spatio-temporal graph embeddings.

Categories:
8 Views

Handshape recognition is a fundamental component of Sign Language Recognition (SLR). However, most existing approaches are language-dependent and require extensive training data, which limits their scalability. To address this limitation, we explore the use of SignWriting as a standardized, language-agnostic representation for handshapes. Our method employs Mediapipe for hand landmark extraction, followed by normalization and data augmentation to enhance robustness.

Categories:
7 Views

This paper presents COT-AD, a comprehensive Dataset designed to enhance cotton crop analysis through computer vision. Comprising over 25,000 images captured throughout the cotton growth cycle, with 5,000 annotated images, COT-AD includes aerial imagery for field-scale detection and segmentation and high-resolution DSLR images documenting key diseases. The annotations cover pest and disease recognition, vegetation, and weed analysis, addressing a critical gap in cotton-specific agricultural datasets.

Categories:
10 Views

Recently, generative priors have shown significant improvement for unsupervised image restoration. This study explores the incorporation of multiple loss functions that capture various perceptual and structural aspects of image quality. Our proposed method improves robustness across multiple tasks, including denoising, upsampling, inpainting, and deartifacting, by utilizing a comprehensive loss function based on Learned Perceptual Image Patch Similarity(LPIPS), Multi-Scale Structural Similarity Index Measure Loss(MS-SSIM), Consistency, Feature, and Gradient losses.

Categories:
21 Views

Pages