ART-DECO: Arbitrary Text Guidance for 3D Detailizer Construction

Qimin Chen1 Yuezhi Yang2 Wang Yifan3 Vladimir G. Kim3 Siddhartha Chaudhuri3 Hao Zhang1 Zhiqin Chen3
1Simon Fraser University 2The University of Texas at Austin 3Adobe Research
[Paper (SIGGRAPH Asia 2025 Conference)] [Code (Coming soon)]

Our 3D detailizer is trained using a text prompt, which defines the shape class and guides the stylization and detailization of any number of coarse 3D shapes with varied structures. Once trained, our detailizer can instantaneously (in <1s) transform a coarse proxy into a detailed 3D shape, whose overall structure respects the input proxy and the appearance and style of the generated details follow the prompt.

Our interactive modeling interface allows users to iteratively edit a coarse voxel grid, select a text prompt, and visualize the resulting detailed and textured 3D shape in a real-time manner.

- Abstract -

We introduce a 3D detailizer, a neural model which can instantaneously (in <1s) transform a coarse 3D shape proxy into a high-quality asset with detailed geometry and texture as guided by an input text prompt. Our model is trained using the text prompt, which defines the shape class and characterizes the appearance and fine-grained style of the generated details. The coarse 3D proxy, which can be easily varied and adjusted (e.g., via user editing), provides structure control over the final shape. Importantly, our detailizer is not optimized for a single shape; it is the result of distilling a generative model, so that it can be reused, without retraining, to generate any number of shapes, with varied structures, whose local details all share a consistent style and appearance. Our detailizer training utilizes a pretrained multi-view image diffusion model, with text conditioning, to distill the foundational knowledge therein into our detailizer via Score Distillation Sampling (SDS). To improve SDS and enable our detailizer architecture to learn generalizable features over complex structures, we train our model in two training stages to generate shapes with increasing structural complexity. Through extensive experiments, we show that our method generates shapes of superior quality and details compared to existing text-to-3D models under varied structure control. Our detailizer can refine a coarse shape in less than a second, making it possible to interactively author and adjust 3D shapes. Furthermore, the user-imposed structure control can lead to creative, and hence out-of-distribution, 3D asset generations that are beyond the current capabilities of leading text-to-3D generative models. We demonstrate an interactive 3D modeling workflow our method enables, and its strong generalizability over styles, structures, and object categories.

- Method -

Overview of the training of our detailizer. Given a coarse voxel grid and a text prompt that describes a style, two 3D convolutional networks upsample the coarse voxels into high-resolution density and albedo fields, respectively. Multi-view images are then rendered from the density and albedo fields, and a pretrained multi-view diffusion model conditioned on the text prompt is used as a prior for Score Distillation Sampling \(\mathcal{L}_{SDS}\). The regularization loss \(\mathcal{L}_{reg}\) measures the similarity between the masks rendered from the generated shape and those from the input coarse voxel grid, thus enforcing the structure of the generated shape to be consistent with the input coarse voxels.

- Results -

Results of text-guided detailization with input coarse voxels control. We show the input coarse voxels on the left and the text prompts on the top.

- Procedural Editing -

Example of procedural editing. After training the model with the text prompt “an office chair with wheels and thick padding”, the detailization of each edit takes less than one second. Our method demonstrates strong robustness to minor modifications and local edits.

- Citation -


          @inproceedings{chen2025artdeco,
          author = {Chen, Qimin and Yang, Yuezhi and Wang, Yifan and Kim, Vladimir G. and Chaudhuri, Siddhartha and Zhang, Hao and Chen, Zhiqin},
          title = {ART-DECO: Arbitrary Text Guidance for 3D Detailizer Construction},
          year = {2025},
          booktitle = {SIGGRAPH Asia 2025 Conference Papers},
          }