
- Read more about Supplementary Material
- Log in to post comments
Multi-class part parsing is a dense prediction task that segments objects into semantic components with multi-level abstractions. Despite its significance, this task remains challenging due to ambiguities at both part and class levels. In this paper, we propose a network that incorporates multi-class boundaries to precisely identify and emphasize the spatial boundaries of part classes, thereby improving segmentation quality. Additionally, we employ a weighted multi-label cross-entropy loss function to ensure balanced and effective learning from all parts.
- Categories:

- Read more about Supplementary Material for Deep Features based on Contrastive Fusion of Transformer and CNN for Semantic Segmentation
- Log in to post comments
Supplementary Material for Deep Features based on Contrastive Fusion of Transformer and CNN for Semantic Segmentation
- Categories:

- Read more about Supplementary Material: CHUG - Crowdsourced User-Generated HDR Video Quality
- 1 comment
- Log in to post comments
High Dynamic Range (HDR) videos enhance visual experiences with superior brightness, contrast, and color depth. The surge of User-Generated Content (UGC) on platforms like YouTube and TikTok introduces unique challenges for HDR video quality assessment (VQA) due to diverse capture conditions, editing artifacts, and compression distortions. Existing HDR-VQA datasets primarily focus on professionally generated content (PGC), leaving a gap in understanding real-world UGC-HDR degradations.
- Categories:

- Read more about Supplementary Material: CHUG - Crowdsourced User-Generated HDR Video Quality
- Log in to post comments
High Dynamic Range (HDR) videos enhance visual experiences with superior brightness, contrast, and color depth. The surge of User-Generated Content (UGC) on platforms like YouTube and TikTok introduces unique challenges for HDR video quality assessment (VQA) due to diverse capture conditions, editing artifacts, and compression distortions. Existing HDR-VQA datasets primarily focus on professionally generated content (PGC), leaving a gap in understanding real-world UGC-HDR degradations.
- Categories:

- Read more about ExDF: Supplementary Material
- Log in to post comments
Supplementary Material
- Categories:

- Read more about Supplementary Material
- 1 comment
- Log in to post comments
Although many deepfake detection methods have been proposed to fight against severe misuse of generative AI, none provide detailed human-interpretable explanations beyond simple real/fake responses. This limitation makes it challenging for humans to assess the accuracy of detection results, especially when the models encounter unseen deepfakes. To address this issue, we propose a novel deepfake detector based on a large Vision-Language Model (VLM), capable of explaining manipulated facial regions.
- Categories:

- Read more about Supplementary Material
- 1 comment
- Log in to post comments
In nighttime conditions, high noise levels and bright Illumination sources degrade image quality, making low-light image enhancement challenging. Thermal images provide complementary information, offering richer textures and structural details. We propose RT-X Net, a cross-attention network that fuses RGB and thermal images for nighttime image enhancement. We leverage self-attention networks for feature extraction and a cross-attention mechanism for fusion to effectively integrate information from both modalities.
- Categories:

- Read more about Cross-Domain Video Object Detection via Augmented-Shot FineTuning-0
- Log in to post comments
This document contains supplementary material for the ICIP PAPER.
- Categories:

- Read more about Supplementary Material for LeMoRe
- Log in to post comments
Lightweight semantic segmentation is essential for many downstream vision tasks. Unfortunately, existing methods often struggle to balance efficiency and performance due to the complexity of feature modeling. Many of these existing approaches are constrained by rigid architectures and implicit representation learning, often characterized by parameter-heavy designs and a reliance on computationally intensive Vision Transformer-based frameworks.
- Categories:
