More
    HomeAI Papers

    AI Papers

    PixArt-Σ Redefines High-Resolution AI Art

    New Diffusion Transformer Model Sets Benchmark for 4K Text-to-Image Generation High-Quality Training Regimen: PixArt-Σ employs a 'weak-to-strong training' strategy, utilizing superior-quality data to enhance fidelity...

    CTRL-Adapter Unlocks New Efficiencies in Controlled Image and Video Generation

    Enhancing Pretrained ControlNets for Seamless Integration with Diffusion Models Efficiency and Versatility: CTRL-Adapter enhances existing ControlNets to work with any diffusion model without the need...

    PhyScene: Embodied AI with Interactive 3D Scene Synthesis

    Bridging the Gap Between Digital Creation and Physical Interactivity Advanced Scene Synthesis: PhyScene introduces a conditional diffusion model designed to generate physically interactable 3D scenes,...

    Tango 2: Advancing Audio Generation with Preference-Driven Diffusion Models

    Enhancing Text-to-Audio Translations via Direct Preference Optimization troduction of Preference Optimization: Tango 2 utilizes a novel approach in the realm of text-to-audio generation by employing...

    Imagine Colorization: Image Colorization with AI

    A Novel Framework for Interactive and Editable AI-Driven Colorization Innovative Imagination Module: The core feature of the Imagine Colorization framework is its ability to generate...

    Exploring 3D Awareness in Visual Foundation Models: A New Study by Google

    Probing the Depth and Multiview Consistency of AI-Driven Visual Perception 3D Structural Encoding: The study investigates whether visual foundation models not only manage 2D object...

    Navigating the Dual Edges of Generative AI: Insights from German Federal Office’s Latest Report

    Comprehensive Analysis of Risks and Opportunities in AI Use by Industry and Authorities ata Quality and Security Concerns: The report highlights significant risks associated with...

    Ferret-v2 Unveiled: Apple’s Enhanced Model for Advanced Image Understanding

    Refining Visual Processing in Large Language Models Enhanced Resolution Handling: Ferret-v2 introduces 'any resolution grounding and referring,' allowing for superior processing of high-resolution images, significantly...

    Rho-1 Unveiled: Microsoft’s New Model Prioritizes Efficiency in Language Training

    A Paradigm Shift in AI Language Learning with Selective Language Modeling Introduction of Selective Language Modeling (SLM): Rho-1, Microsoft's latest language model, uses a novel...

    RealmDreamer: Advancing 3D Scene Generation with Innovative Text-Driven Technology

    A New Frontier in 3D Visualization Combining Inpainting and Depth Diffusion Independent of Scene-Specific Datasets: RealmDreamer uniquely generates 3D scenes without the need for training...

    Urban Architect: Pioneering 3D Urban Scene Generation with Textual Insights

    Bridging Text and Urban Scale 3D Modeling through Innovative AI Techniques Introduction of Compositional 3D Layouts: Urban Architect integrates a novel 3D layout representation into...

    Champ Unveils New Era in Human Image Animation with 3D Parametric Model Integration

    Revolutionary Method Enhances Motion Capture and Animation Realism through Advanced 3D Modeling Innovative Integration of 3D Modeling: Champ leverages the SMPL 3D parametric model within...