We present Assembler, a scalable and generalizable framework for 3D part assembly that reconstructs complete objects from input part meshes and a reference image. Unlike prior approaches that mostly rely on deterministic part pose prediction and category-specific training, Assembler is designed to handle diverse, in-the-wild objects with varying part counts, geometries, and structures. It addresses the core challenges of scaling to general 3D part assembly through innovations in task formulation, representation, and data. First, Assembler casts part assembly as a generative problem and employs diffusion models to sample plausible configurations, effectively capturing ambiguities arising from symmetry, repeated parts, and multiple valid assemblies. Second, we introduce a novel shape-centric representation based on sparse anchor point clouds, enabling scalable generation in Euclidean space rather than SE(3) pose prediction. Third, we construct a large-scale dataset of over 320K diverse part-object assemblies using a synthesis and filtering pipeline built on existing 3D shape repositories. Assembler achieves state-of-the-art performance on PartNet and is the first to demonstrate high-quality assembly for complex, real-world objects. Based on Assembler, we further introduce an interesting part-aware 3D modeling system that generates high-resolution, editable objects from images, demonstrating potential for interactive and compositional design.
Pipeline. Overview of Assembler (Left) and part-aware 3D generation pipeline (Right). (Left) The input part meshes are sampled as anchor points representation, followed by DoRA to extract shape features. These shape features are concatenated with noised point tokens, and a diffusion model is trained to generate assembled anchor points. After that, a simple least-squares fitting is used to compute part poses from generated and input anchor points to assemble the input meshes as a final object. (Right) The input image is first fed into VLMs to infer the parts and generate reference images for each part. Then an image-to-3D generator is applied to produce part meshes. Finally, Assembler generates complete, high-resolution, part-aware 3D models by assembling the part meshes.