More
    HomeAI NewsTechDimension Defied: Microsoft Unveils TRELLIS.2, the New Standard in Generative 3D

    Dimension Defied: Microsoft Unveils TRELLIS.2, the New Standard in Generative 3D

    How Structured LATent (SLAT) architecture and Flow Transformers are reshaping the future of 3D asset creation.

    • A Massive Leap in Fidelity: TRELLIS.2 arrives as a 4-billion parameter flow-matching transformer capable of converting single images into textured 3D meshes at resolutions up to 1536³.
    • The SLAT Advantage: The model is built on a unified “Structured LATent” representation, allowing it to seamlessly decode assets into multiple formats, including Radiance Fields, 3D Gaussians, and meshes.
    • Open and Versatile: Microsoft is releasing the code and weights under an MIT license, democratizing access to high-quality 3D generation, local editing capabilities, and a massive training dataset of 500,000 objects.

    The barrier between two-dimensional imagination and three-dimensional reality just got significantly thinner. Microsoft has officially introduced TRELLIS.2, a groundbreaking evolution in 3D asset generation. By leveraging a massive 4-billion parameter model and a novel architectural approach, TRELLIS.2 promises to transform a single image into a high-fidelity, textured 3D mesh with unprecedented ease. This release represents a significant step forward from previous methods, offering not just higher resolution—up to 1536³—but a level of versatility and editability that has historically eluded generative 3D models.

    At the heart of this breakthrough is a unified representation known as Structured LATent (SLAT). In the past, generative models often struggled to balance the structural integrity of an object with its surface details. SLAT solves this by marrying a sparsely-populated 3D grid with dense multiview visual features. Essentially, the model defines “local latents” on active voxels that intersect the object’s surface. These latents are encoded by fusing image features from densely rendered views of the asset, processed by powerful vision foundation models. This hybrid approach captures the best of both worlds: the coarse geometry provided by the active voxels and the intricate visual characteristics derived from the pre-trained vision encoders.

    The engine driving this generation is a robust backbone of Rectified Flow Transformers tailored specifically for the SLAT architecture. The generation process follows a sophisticated two-stage pipeline: first, it generates the sparse structure of the SLAT, and subsequently, it generates the latent vectors for the non-empty cells. Trained on a carefully collected dataset of 500,000 diverse objects, the model has learned to produce results that possess both detailed geometry and vivid texture. While the original architecture was validated with 2 billion parameters, the arrival of TRELLIS.2 pushes this boundary to 4 billion parameters, significantly surpassing existing methods in scale and output quality.

    What truly sets TRELLIS apart is its output versatility. In the current 3D landscape, different pipelines require different formats. TRELLIS addresses this by acting as a universal translator of sorts; because of the unified nature of SLAT, the model can decode generated assets into whatever format the user requires. Whether a developer needs Radiance Fields (NeRFs) for view synthesis, 3D Gaussians for real-time rendering, or standard meshes for game engines, TRELLIS can accommodate the request without losing fidelity.

    Finally, TRELLIS introduces a workflow feature that creators have been desperate for: flexible editing. Previous text-to-3D or image-to-3D models often produced “static” results—if the output wasn’t perfect, the user had to start over. TRELLIS allows for local 3D editing and the generation of variants. This means a designer can tweak specific parts of a 3D asset or ask the model to generate variations of the same object, streamlining the iterative design process. With the model, code, and data being released under an open MIT license, Microsoft is effectively handing the keys to the next generation of 3D creation to the community.

    Must Read