
- Read more about DETECTS: Deep Clustering of Temporal Skeletons for Graph-based Segmentation
- Log in to post comments
Unsupervised Temporal Action Localization (UTAL) aims to segment untrimmed videos into semantically coherent actions without using temporal annotations. Existing UTAL methods rely on contrastive pretext tasks or shallow clustering pipelines that decouple representation learning from segmentation, limiting their ability to capture fine-grained temporal transitions. In this work, we propose a unified deep clustering framework for skeleton-based UTAL that formulates motion segmentation as a spatio-temporal graph separation problem in the embedding space.
- Categories:

- Read more about Shuffle Patchmix Augmentation with Confidence-Margin Weighted Pseudo-Labels for Enhanced Source-Free Domain Adaptation Poster
- Log in to post comments
This work investigates Source-Free Domain Adaptation (SFDA), where a model adapts to a target domain without access to source data. A new augmentation technique, Shuffle PatchMix (SPM), and a novel reweighting strategy are introduced to enhance performance. SPM shuffles and blends image patches to generate diverse and challenging augmentations, while the reweighting strategy prioritizes reliable pseudo-labels to mitigate label noise. These techniques are particularly effective on smaller datasets like PACS, where overfitting and pseudo-label noise pose greater risks.
- Categories:

- Read more about VChain Demo Video
- Log in to post comments
The demo video is available below.
VChain.pdf

- Categories:

- Read more about Prompt Based Scoring
- Log in to post comments
Prompt for scoring generations.
- Categories:

- Read more about dataset_generation_prompt
- Log in to post comments
Gemini-prompt to generation caption given an image.
- Categories:

- Read more about UTAL-GNN: Unsupervised Temporal Action Localization using Graph Neural Networks
- Log in to post comments
Fine-grained action localization in untrimmed sports videos presents a significant challenge due to rapid and subtle motion transitions over short durations. Existing supervised and weakly supervised solutions often rely on extensive annotated datasets and high-capacity models, making them computationally intensive and less adaptable to real-world scenarios. In this work, we introduce a lightweight and unsupervised skeleton-based action localization pipeline that leverages spatio-temporal graph neural representations.
- Categories:

- Read more about GIVE: A Multi-Agent Framework for Generating Immersive Multi-Modal Virtual Environments for 3D Games - Supplementary Material
- Log in to post comments
In this work, we present a novel multi-agent framework for generating immersive 3D virtual environments from high-level semantic inputs, powered by large language and vision-language models (LLMs/VLMs). Unlike prior work that focuses primarily on visual output, data-intensive training pipelines, and code generation, our system coordinates a team of specialized agents, each assigned a role such as manager, planner, or expert in visual, audio, or spatial domains, to decompose and execute environment construction tasks within a game engine.
- Categories:

- Read more about (Appendix) Rethinking the Backbone in Class Imbalanced Federated Source Free Domain Adaptation: The Utility of Vision Foundation Models
- Log in to post comments
Appendix of our paper: "Rethinking the Backbone in Class Imbalanced Federated Source Free Domain Adaptation: The Utility of Vision Foundation Models" accepted at IEEE ICIP 2025 workshop: Edge Intelligence: Smart, Efficient, and Scalable Solutions for IoT, Wearables, and Embedded Devices (SEEDS)
- Categories:

- Read more about ICIP2025 supplementary material camera ready
- Log in to post comments
We address the challenges of local feature matching under large scale and rotation changes by focusing on keypoint positions.
First, we propose a novel module called similarity normalization (SN).
This module normalizes keypoint positions to remove translation, rotation and scale differences between image pairs.
By performing positional encoding on these normalized positions, a network incorporating with SN can effectively avoid encoding largely different positions into descriptors from the two images.
- Categories:

- Read more about ICIP 2025 Supplementary
- Log in to post comments
This supplementary material accompanies our paper titled "Texturing Endoscopic 3D Stomach via Neural Radiance Field under Uneven Lighting."
- Categories: