More
    HomeAI Papers

    AI Papers

    AI-Generated Fake News Threatens Future Elections

    The rise of AI-generated misinformation poses a significant risk to democratic integrity Convincing Misinformation: AI models like GPT-3 generate fake news stories that many people find...

    Knee Kinematics Reconstruction with Smartphone Video and IMU Sensors

    Integrating Wearable Sensors and Video for Advanced Clinical Assessment Fusion of Technologies: Combining uncalibrated IMUs and handheld smartphone video enhances the accuracy of knee kinematics reconstruction. Clinical...

    3DitScene: Redefining Scene Editing with Language-Guided Disentangled Gaussian Splatting

    A New Era of Scene Image Editing with Enhanced Control and Precision Unified 2D to 3D Editing: 3DitScene introduces a seamless framework for editing scenes from...

    VeLoRA from Huawei: Efficient Memory Usage for Large Language Model Training

    A New Approach to Reducing Memory Consumption in Training Large Language Models VeLoRA introduces rank-1 sub-token projections to significantly reduce memory requirements during model training. The...

    Part123: Part-aware 3D Reconstruction from a Single-view Image

    Enhancing 3D Models with Structural Detail from Single-view Images Innovative Multiview Diffusion Technique: Uses diffusion models to create multiview images for accurate 3D reconstruction. Part-aware Segmentation:...

    Human4DiT: Free-view Human Video Generation with 4D Diffusion Transformer

    Revolutionizing Human Video Generation for Virtual Reality and Animation Innovative 4D Transformer Architecture: Efficient modeling of spatio-temporal correlations across viewpoints and time. Precise Conditioning Mechanism: Utilizes...

    iVideoGPT: Pioneering Interactive Video World Models

    Transforming Video Generation for Enhanced AI Interactivity Scalable Autoregressive Transformer: iVideoGPT integrates multimodal signals into a sequence of tokens for interactive AI experiences. Compressive Tokenization Technique:...

    Meteor: Mamba-based Traversal for Enhancing Large Language and Vision Models

    Leveraging Multifaceted Rationales for Superior Performance Unified Transformer Model: Meteor leverages the Mamba architecture to efficiently embed multifaceted rationales. Enhanced Performance: Significant improvements in vision-language tasks...

    Sony’s Visual Echoes: A Unified Transformer for Audio-Visual Generation

    Exploring a Lightweight Approach to Bridging Visual and Audio Generation Unified Transformer Model: Visual Echoes uses a simple generative transformer for both audio-visual generation and...

    OmniGlue from Google: Enhancing Image Matching with Foundation Model Guidance

    New AI Technique Promises Better Cross-Domain Generalization in Image Matching Foundation Model Guidance: OmniGlue uses a vision foundation model to improve feature matching across different...

    Grounding DINO 1.5 Advances Open-Set Object Detection

    IDEA Research Introduces High-Performance and Efficient Models for Enhanced Object Detection Two Advanced Models: Grounding DINO 1.5 Pro and Grounding DINO 1.5 Edge offer high-performance...

    CAT3D Revolutionizes 3D Content Creation with Multi-View Diffusion Models

    New Method Generates 3D Scenes Quickly and Efficiently from Minimal Inputs Efficient 3D Generation: CAT3D uses multi-view diffusion models to generate consistent 3D scenes from...