More
    HomeAI Papers

    AI Papers

    3DitScene: Redefining Scene Editing with Language-Guided Disentangled Gaussian Splatting

    A New Era of Scene Image Editing with Enhanced Control and Precision Unified 2D to 3D Editing: 3DitScene introduces a seamless framework for editing scenes from...

    VeLoRA from Huawei: Efficient Memory Usage for Large Language Model Training

    A New Approach to Reducing Memory Consumption in Training Large Language Models VeLoRA introduces rank-1 sub-token projections to significantly reduce memory requirements during model training. The...

    Part123: Part-aware 3D Reconstruction from a Single-view Image

    Enhancing 3D Models with Structural Detail from Single-view Images Innovative Multiview Diffusion Technique: Uses diffusion models to create multiview images for accurate 3D reconstruction. Part-aware Segmentation:...

    Human4DiT: Free-view Human Video Generation with 4D Diffusion Transformer

    Revolutionizing Human Video Generation for Virtual Reality and Animation Innovative 4D Transformer Architecture: Efficient modeling of spatio-temporal correlations across viewpoints and time. Precise Conditioning Mechanism: Utilizes...

    iVideoGPT: Pioneering Interactive Video World Models

    Transforming Video Generation for Enhanced AI Interactivity Scalable Autoregressive Transformer: iVideoGPT integrates multimodal signals into a sequence of tokens for interactive AI experiences. Compressive Tokenization Technique:...

    Meteor: Mamba-based Traversal for Enhancing Large Language and Vision Models

    Leveraging Multifaceted Rationales for Superior Performance Unified Transformer Model: Meteor leverages the Mamba architecture to efficiently embed multifaceted rationales. Enhanced Performance: Significant improvements in vision-language tasks...

    Sony’s Visual Echoes: A Unified Transformer for Audio-Visual Generation

    Exploring a Lightweight Approach to Bridging Visual and Audio Generation Unified Transformer Model: Visual Echoes uses a simple generative transformer for both audio-visual generation and...

    OmniGlue from Google: Enhancing Image Matching with Foundation Model Guidance

    New AI Technique Promises Better Cross-Domain Generalization in Image Matching Foundation Model Guidance: OmniGlue uses a vision foundation model to improve feature matching across different...

    Grounding DINO 1.5 Advances Open-Set Object Detection

    IDEA Research Introduces High-Performance and Efficient Models for Enhanced Object Detection Two Advanced Models: Grounding DINO 1.5 Pro and Grounding DINO 1.5 Edge offer high-performance...

    CAT3D Revolutionizes 3D Content Creation with Multi-View Diffusion Models

    New Method Generates 3D Scenes Quickly and Efficiently from Minimal Inputs Efficient 3D Generation: CAT3D uses multi-view diffusion models to generate consistent 3D scenes from...

    Understanding Transformer Performance with Associative Memory

    Huawei's Framework Offers New Insights Beyond Traditional Scaling Laws Associative Memory Modeling: Transformers are modeled using associative memories, explaining the attention mechanism through Hopfield networks. Energy...

    Coin3D Introduces Controllable and Interactive 3D Assets Generation

    New Framework Allows Users to Control and Edit 3D Models with Ease Interactive Generation Workflow: Coin3D enables users to control 3D generation using coarse geometry...