More
    HomeAI Papers

    AI Papers

    Unlocking AI’s Hidden Layers: Gemma Scope Opens Doors to Advanced Model Interpretability

    Google’s New Suite of Sparse Autoencoders Enhances AI Safety and Research Gemma Scope introduces an open suite of Sparse Autoencoders (SAEs) designed to improve interpretability...

    Puppet-Master: Revolutionizing Interactive Video Generation for Detailed Motion Dynamics

    Leveraging advanced AI to bring part-level animation to life with unprecedented realism Innovative Motion Prior for Part-Level Dynamics: Puppet-Master introduces a new way to generate...

    Achieving Human-Level Competitive Robot Table Tennis

    Google DeepMind's Robot Reaches New Heights in Sports Robotics Breakthrough in Robot Table Tennis: Google DeepMind's robot achieves amateur human-level performance in competitive table tennis,...

    Generating 3D Objects with 64×64 Pixels: A New Era in 3D Modeling

    New Approach Converts 3D Models into 2D Images for Simplified Generation New method encapsulates 3D geometry and appearance into a 64x64 pixel image, simplifying the...

    IPAdapter-Instruct: Enhancing Image Generation Control with Instruction Prompts

    Resolving Ambiguity in Image-based Conditioning with Instruct Prompts IPAdapter-Instruct combines natural-image conditioning with instruct prompts to clarify user intent in image generation. This new approach maintains...

    VidGen-1M: Elevating Text-to-Video Generation with a Superior Dataset

    Introducing VidGen-1M, a breakthrough dataset designed to enhance text-to-video generation models VidGen-1M addresses the shortcomings of existing video-text datasets. It ensures high video quality, detailed captions,...

    TexGen: 3D Texture Generation with Multi-view Sampling

    Innovative Framework Enhances Texture Quality and Consistency for 3D Meshes Seamless Textures: TexGen eliminates prominent seams and excessive smoothing in 3D textures using a multi-view sampling...

    Tora: Video Generation with Trajectory-Oriented Diffusion Transformers

    Exploring Tora’s Potential in Motion-Controllable Video Creation Innovative Framework: Tora integrates text, image, and trajectory inputs for precise motion-controlled video generation. High Fidelity: Achieves high-quality video output with...

    The Llama 3 Herd of Models

    Multilinguality, Coding, Reasoning, and Tool Usage in a New Set of AI Foundation Models Llama 3's Capabilities: The Llama 3 models support multilinguality, coding, reasoning, and...

    Cycle3D: High-quality and Consistent Image-to-3D Generation

    Advancing 3D Content Creation through a Generation-Reconstruction Cycle Cycle3D combines 2D diffusion-based generation with 3D reconstruction for superior image-to-3D conversion. The framework enhances the quality and...

    Theia: Distilling Diverse Vision Foundation Models for Robot Learning

    Enhancing Robot Learning with Rich Visual Representations Theia leverages multiple vision foundation models to improve robot learning. The model outperforms previous approaches with less training data...

    Text2Place: Affordance Aware Human Guided Placement

    Advancing Realistic Human Insertion in Diverse Backgrounds Text2Place generates realistic human placements in various scenes using text guidance. The method utilizes semantic masks and subject-conditioned inpainting...