Sorry, you need to enable JavaScript to visit this website.

In this work, we present a novel multi-agent framework for generating immersive 3D virtual environments from high-level semantic inputs, powered by large language and vision-language models (LLMs/VLMs). Unlike prior work that focuses primarily on visual output, data-intensive training pipelines, and code generation, our system coordinates a team of specialized agents, each assigned a role such as manager, planner, or expert in visual, audio, or spatial domains, to decompose and execute environment construction tasks within a game engine.

Categories:
21 Views

We address the challenges of local feature matching under large scale and rotation changes by focusing on keypoint positions.
First, we propose a novel module called similarity normalization (SN).
This module normalizes keypoint positions to remove translation, rotation and scale differences between image pairs.
By performing positional encoding on these normalized positions, a network incorporating with SN can effectively avoid encoding largely different positions into descriptors from the two images.

Categories:
12 Views

This supplementary material accompanies our paper titled "Texturing Endoscopic 3D Stomach via Neural Radiance Field under Uneven Lighting."

Categories:
23 Views

O1-mini prompt to expose the seven components of agricultural disease management evaluation framework

Categories:
11 Views

This supplementary material presents detailed transformer model architectures, training parameters, and comprehensive evaluation metrics to complement our comparison of RNN and transformer models for Indonesian news classification. Our analysis provides deeper insights into why transformer models outperform RNN approaches despite their larger parameter counts.

Categories:
8 Views

Supplemental Material

Categories:
24 Views

Pages