DECOLLAGE: 3D Detailization by Controllable, Localized, and Learned Geometry Enhancement

Qimin Chen1, 2 Zhiqin Chen2 Vladimir G. Kim2 Noam Aigerman3 Hao Zhang1, 4 Siddhartha Chaudhuri2
1Simon Fraser University 2Adobe Research 3University of Montreal 4Amazon
[Paper (ECCV 2024)] [Poster] [Code]

Décollage is an art form created by “cutting/removing pieces of an original image”. When “painting” a style exemplar with geometric details over a region of a coarse shape, coarse surfaces are removed to unveil a detailized version to mimic the exemplar. We show an out-of-distribution chair-like shape detailized via style mixing, where five exemplars “décollaged" the coarse voxels.

- Abstract -

We present a 3D modeling method which enables end-users to refine or detailize 3D shapes using machine learning, expanding the capabilities of AI-assisted 3D content creation. Given a coarse voxel shape (e.g., one produced with a simple box extrusion tool or via generative modeling), a user can directly “paint” desired target styles representing compelling geometric details, from input exemplar shapes, over different regions of the coarse shape. These regions are then up-sampled into high-resolution geometries which adhere with the painted styles. To achieve such controllable and localized 3D detailization, we build on top of a Pyramid GAN by making it masking-aware. We devise novel structural losses and priors to ensure that our method preserves both desired coarse structures and fine-grained features even if the painted styles are borrowed from diverse sources, e.g., different semantic parts and even different shape categories. Through extensive experiments, we show that our ability to localize details enables novel interactive creative workflows and applications. Our experiments further demonstrate that in comparison to prior techniques built on global detailization, our method generates structure-preserving, high-resolution stylized geometries with more coherent shape details and style transitions.

- Method -

Network architecture. Conditioned on a set of style codes associated with each segmented part, the network upsamples the coarse content voxel with part labels into detailed geometries in multiple resolutions. For each upsampling level \(j\), the discriminator enforces the local patches of each part in the upsampled geometry to be plausible with respect to the styles they are conditioned on. The structure-preserving losses \(\mathcal{L}_{down}^{j}\) and \(\mathcal{L}_{up}^{j}\) enforce the structure of the output to be consistent with the input.

- Citation -


          TBA