ESCT3D

Recent advancements in text-driven 3D content generation highlight several challenges. Surveys show that users often provide simple text inputs while expecting high-quality results. Generating optimal 3D content from minimal prompts is difficult due to the strong dependency of text-to-3D models on input quality. Moreover, the generation process exhibits high variability, often requiring many attempts to meet user expectations, reducing efficiency. To address this, we propose GPT-4V for self-optimization, enhancing generation efficiency and enabling satisfactory results in a single attempt. Furthermore, the controllability of text-to-3D methods remains underexplored. Our approach allows users to specify not only textual descriptions but also conditions such as style, edges, scribbles, poses, or combinations, enabling finer control over generated 3D content. Additionally, we integrate multi-view information, including depth, masks, features, and images, to address the Janus problem in 3D generation. Experiments show that our method achieves robust generalization, enabling efficient and controllable high-quality 3D content generation.

Supplementary_Materials.pdf

Supplementary_Materials.pdf (122)

Thumbs Up

CITE

Documents

Supplementary materials for experimental results.

Supplementary_Materials.pdf

QUESTIONS?