More
    HomeAI Papers

    AI Papers

    SuperEdit: Image Editing with Smarter Supervision

    Transforming Instruction-Based Editing Through Rectified Guidance and Contrastive Learning SuperEdit introduces a groundbreaking approach to instruction-based image editing by rectifying editing instructions and aligning them...

    Apple’s Bold Step: How Length Skews Uncertainty Quantification

    Exploring the Hidden Flaws in UQ Evaluation and the Promise of LM-as-a-Judge Uncertainty Quantification (UQ) in Language Models (LMs) is vital for safety and reliability,...

    Values in the Wild: Uncovering the Hidden Judgments of AI

    How Language Models Like Claude Reveal Their Values in Real-World Conversations AI models like Claude, developed by Anthropic, are trained to reflect specific values such...

    WORLDMEM: Virtual Worlds with Lasting Memory

    How a Memory-Based Framework Ensures Long-Term Consistency in World Simulation WORLDMEM introduces a groundbreaking framework for world simulation, utilizing a memory bank to store past...

    BitNet b1.58 2B4T: Redefining Efficiency in Large Language Models

    The First Open-Source, Native 1-Bit LLM at Scale BitNet b1.58 2B4T is the first open-source, native 1-bit Large Language Model (LLM) at the 2-billion parameter...

    Transforming 3D Perception: NormalCrafter Pioneers Video Normal Estimation

    Harnessing Video Diffusion for Temporally Consistent Surface Normals NormalCrafter introduces a groundbreaking approach to surface normal estimation in videos, leveraging video diffusion priors to ensure...

    EMOAGENT: GUARDING MINDS IN THE AGE OF AI CONVERSATION

    Human-AI Interaction for Mental Health Safety The rise of LLM-driven AI characters, like those on platforms such as Character.AI, has created new opportunities for emotional...

    FlexIP: Mastering Image Generation with Precision and Creativity

    Balancing Identity Preservation and Personalized Editing in 2D Generative Models FlexIP introduces a groundbreaking framework that decouples identity preservation and stylistic manipulation in 2D image...

    Realistic Talking Portraits: The FantasyTalking Approach

    Unleashing the Power of Coherent Motion Synthesis in Avatar Animation FantasyTalking introduces a novel framework that leverages a pretrained video diffusion transformer model to generate...

    LIVEVQA: Can AI Keep Up with the Fast-Paced World of Visual News?

    A New Benchmark Tests AI’s Ability to Answer Real-Time Visual Questions Introducing LIVEVQA – A groundbreaking dataset of 3,602 visual questions sourced from live news, designed...

    GeometryCrafter: Revolutionizing 3D Reconstruction from Open-World Videos

    Unleashing the Power of Diffusion Priors for Consistent Geometry Estimation GeometryCrafter introduces a novel framework that recovers high-fidelity point map sequences with temporal coherence from...

    Unleashing Instant 3D Creation: The Power of Progressive Rendering Distillation

    Transforming Text into Meshes in Seconds with Stable Diffusion PRD enables the adaptation of SD into a native 3D generator, eliminating the need for 3D...