HomeAI Papers

AI Papers

Puppet-Master: Revolutionizing Interactive Video Generation for Detailed Motion Dynamics

Leveraging advanced AI to bring part-level animation to life with unprecedented realism Innovative Motion Prior for Part-Level Dynamics: Puppet-Master introduces a new way to generate...

Achieving Human-Level Competitive Robot Table Tennis

Google DeepMind's Robot Reaches New Heights in Sports Robotics Breakthrough in Robot Table Tennis: Google DeepMind's robot achieves amateur human-level performance in competitive table tennis,...

Generating 3D Objects with 64×64 Pixels: A New Era in 3D Modeling

New Approach Converts 3D Models into 2D Images for Simplified Generation New method encapsulates 3D geometry and appearance into a 64x64 pixel image, simplifying the...

IPAdapter-Instruct: Enhancing Image Generation Control with Instruction Prompts

Resolving Ambiguity in Image-based Conditioning with Instruct Prompts IPAdapter-Instruct combines natural-image conditioning with instruct prompts to clarify user intent in image generation. This new approach maintains...

VidGen-1M: Elevating Text-to-Video Generation with a Superior Dataset

Introducing VidGen-1M, a breakthrough dataset designed to enhance text-to-video generation models VidGen-1M addresses the shortcomings of existing video-text datasets. It ensures high video quality, detailed captions,...

TexGen: 3D Texture Generation with Multi-view Sampling

Innovative Framework Enhances Texture Quality and Consistency for 3D Meshes Seamless Textures: TexGen eliminates prominent seams and excessive smoothing in 3D textures using a multi-view sampling...

Tora: Video Generation with Trajectory-Oriented Diffusion Transformers

Exploring Tora’s Potential in Motion-Controllable Video Creation Innovative Framework: Tora integrates text, image, and trajectory inputs for precise motion-controlled video generation. High Fidelity: Achieves high-quality video output with...

The Llama 3 Herd of Models

Multilinguality, Coding, Reasoning, and Tool Usage in a New Set of AI Foundation Models Llama 3's Capabilities: The Llama 3 models support multilinguality, coding, reasoning, and...

Cycle3D: High-quality and Consistent Image-to-3D Generation

Advancing 3D Content Creation through a Generation-Reconstruction Cycle Cycle3D combines 2D diffusion-based generation with 3D reconstruction for superior image-to-3D conversion. The framework enhances the quality and...

Theia: Distilling Diverse Vision Foundation Models for Robot Learning

Enhancing Robot Learning with Rich Visual Representations Theia leverages multiple vision foundation models to improve robot learning. The model outperforms previous approaches with less training data...

Text2Place: Affordance Aware Human Guided Placement

Advancing Realistic Human Insertion in Diverse Backgrounds Text2Place generates realistic human placements in various scenes using text guidance. The method utilizes semantic masks and subject-conditioned inpainting...

HoloDreamer: Transforming Text into 3D Panoramic Worlds

Advancing 3D Scene Generation with Holistic Text-to-Image Models HoloDreamer generates highly consistent 3D panoramic scenes from text descriptions. The framework combines multiple diffusion models with 3D...