Sorry, you need to enable JavaScript to visit this website.

Prompt for scoring generations.

Categories:
4 Views

Gemini-prompt to generation caption given an image.

Categories:
6 Views

Fine-grained action localization in untrimmed sports videos presents a significant challenge due to rapid and subtle motion transitions over short durations. Existing supervised and weakly supervised solutions often rely on extensive annotated datasets and high-capacity models, making them computationally intensive and less adaptable to real-world scenarios. In this work, we introduce a lightweight and unsupervised skeleton-based action localization pipeline that leverages spatio-temporal graph neural representations.

Categories:
20 Views

In this work, we present a novel multi-agent framework for generating immersive 3D virtual environments from high-level semantic inputs, powered by large language and vision-language models (LLMs/VLMs). Unlike prior work that focuses primarily on visual output, data-intensive training pipelines, and code generation, our system coordinates a team of specialized agents, each assigned a role such as manager, planner, or expert in visual, audio, or spatial domains, to decompose and execute environment construction tasks within a game engine.

Categories:
61 Views

Appendix of our paper: "Rethinking the Backbone in Class Imbalanced Federated Source Free Domain Adaptation: The Utility of Vision Foundation Models" accepted at IEEE ICIP 2025 workshop: Edge Intelligence: Smart, Efficient, and Scalable Solutions for IoT, Wearables, and Embedded Devices (SEEDS)

Categories:
66 Views

We address the challenges of local feature matching under large scale and rotation changes by focusing on keypoint positions.
First, we propose a novel module called similarity normalization (SN).
This module normalizes keypoint positions to remove translation, rotation and scale differences between image pairs.
By performing positional encoding on these normalized positions, a network incorporating with SN can effectively avoid encoding largely different positions into descriptors from the two images.

Categories:
19 Views

This supplementary material accompanies our paper titled "Texturing Endoscopic 3D Stomach via Neural Radiance Field under Uneven Lighting."

Categories:
36 Views

O1-mini prompt to expose the seven components of agricultural disease management evaluation framework

Categories:
29 Views

Pages