Computer Vision and Pattern Recognition

DETECTS: Deep Clustering of Temporal Skeletons for Graph-based Segmentation

Read more about DETECTS: Deep Clustering of Temporal Skeletons for Graph-based Segmentation
Log in to post comments

Unsupervised Temporal Action Localization (UTAL) aims to segment untrimmed videos into semantically coherent actions without using temporal annotations. Existing UTAL methods rely on contrastive pretext tasks or shallow clustering pipelines that decouple representation learning from segmentation, limiting their ability to capture fine-grained temporal transitions. In this work, we propose a unified deep clustering framework for skeleton-based UTAL that formulates motion segmentation as a spatio-temporal graph separation problem in the embedding space.

ICASSP_2026 (Supplementary Material).pdf

Detailed Methodology and Theoretical Justification (60)

Categories:: Other

15 Views

UTAL-GNN: Unsupervised Temporal Action Localization using Graph Neural Networks

Read more about UTAL-GNN: Unsupervised Temporal Action Localization using Graph Neural Networks
Log in to post comments

Fine-grained action localization in untrimmed sports videos presents a significant challenge due to rapid and subtle motion transitions over short durations. Existing supervised and weakly supervised solutions often rely on extensive annotated datasets and high-capacity models, making them computationally intensive and less adaptable to real-world scenarios. In this work, we introduce a lightweight and unsupervised skeleton-based action localization pipeline that leverages spatio-temporal graph neural representations.

ICIP(SW)_SUPPLEMENTARY.pdf

Additional Ablation Study and Performance Evaluation Results (115)

Categories:: Other

42 Views