Innovative Framework Enhances Texture Quality and Consistency for 3D Meshes
Seamless Textures:Â TexGen eliminates prominent seams and excessive smoothing in 3D textures using a multi-view sampling...
Exploring Tora’s Potential in Motion-Controllable Video Creation
Innovative Framework:Â Tora integrates text, image, and trajectory inputs for precise motion-controlled video generation.
High Fidelity:Â Achieves high-quality video output with...
Multilinguality, Coding, Reasoning, and Tool Usage in a New Set of AI Foundation Models
Llama 3's Capabilities: The Llama 3 models support multilinguality, coding, reasoning, and...
Advancing 3D Content Creation through a Generation-Reconstruction Cycle
Cycle3D combines 2D diffusion-based generation with 3D reconstruction for superior image-to-3D conversion.
The framework enhances the quality and...
Enhancing Robot Learning with Rich Visual Representations
Theia leverages multiple vision foundation models to improve robot learning.
The model outperforms previous approaches with less training data...
Advancing Realistic Human Insertion in Diverse Backgrounds
Text2Place generates realistic human placements in various scenes using text guidance.
The method utilizes semantic masks and subject-conditioned inpainting...
Advancing 3D Scene Generation with Holistic Text-to-Image Models
HoloDreamer generates highly consistent 3D panoramic scenes from text descriptions.
The framework combines multiple diffusion models with 3D...
Solving Object Repetition in High-Resolution Image Generation
AccDiffusion addresses the issue of object repetition in patch-wise higher-resolution image generation.
The method uses patch-content-aware prompts and dilated...
Innovating Object Addition in Images with Text Guidance Alone
Diffree enables seamless text-guided object addition without compromising background consistency.
The model leverages the OABench dataset, enhancing...
Tailoring AI-Generated Images to Individual Tastes
ViPer personalizes image generation by capturing and applying individual visual preferences.
The system uses user comments to infer visual likes...
A New Benchmark for Realistic Human Image Animation and Camera Control
HumanVid introduces the first large-scale, high-quality dataset tailored for human image animation, combining real-world...
Introducing the First Comprehensive Benchmark for Complex Video Generation from Text Prompts
T2V-CompBench offers the first benchmark tailored for compositional text-to-video generation.
The benchmark includes diverse...