More
    HomeAI Papers

    AI Papers

    InstantMesh Debuts: Transforming Single Images into 3D Meshes in Seconds

    Bridging the Gap Between 2D Images and 3D Models with Advanced AI Techniques Rapid and Efficient 3D Mesh Generation: InstantMesh combines a multiview diffusion model...

    ALOHA 2 from Google: Advancing Bimanual Teleoperation with Enhanced Low-Cost Robotics

    Open-Sourcing Hardware Designs for Improved Robotic Dexterity and Robustness Enhanced Design and Performance: ALOHA 2 introduces significant improvements in robotic components such as grippers and...

    Magic Clothing: Pioneering Garment-Driven Image Synthesis

    Blending Fashion and Technology to Tailor Customized Digital Apparel Innovative Network Architecture: Magic Clothing utilizes a latent diffusion model-based network to create images of characters...

    HQ-Edit: Revolutionizing Instruction-Based Image Editing with AI

    Leveraging AI to Synthesize a New Dataset for Enhanced Image Editing Models Innovative Dataset Creation: HQ-Edit introduces a new way of building image editing datasets...

    Infinite Context: How Google’s Infini-attention Could Revolutionize Large Language Models

    Expanding the Horizons of AI Comprehension and Memory Innovative Memory Management: Infini-attention introduces a compressive memory technique that allows LLMs to retain and access information...

    Red Dead Redemption II – Cradle Framework Unveils Next-Gen Agent for Video Game Autonomy

    Introducing Multimodal Interaction for Universal Computer Control Multimodal Interaction: Cradle integrates visual inputs and keyboard/mouse outputs to operate within complex digital environments like video games,...

    PixArt-Σ Redefines High-Resolution AI Art

    New Diffusion Transformer Model Sets Benchmark for 4K Text-to-Image Generation High-Quality Training Regimen: PixArt-Σ employs a 'weak-to-strong training' strategy, utilizing superior-quality data to enhance fidelity...

    CTRL-Adapter Unlocks New Efficiencies in Controlled Image and Video Generation

    Enhancing Pretrained ControlNets for Seamless Integration with Diffusion Models Efficiency and Versatility: CTRL-Adapter enhances existing ControlNets to work with any diffusion model without the need...

    PhyScene: Embodied AI with Interactive 3D Scene Synthesis

    Bridging the Gap Between Digital Creation and Physical Interactivity Advanced Scene Synthesis: PhyScene introduces a conditional diffusion model designed to generate physically interactable 3D scenes,...

    Tango 2: Advancing Audio Generation with Preference-Driven Diffusion Models

    Enhancing Text-to-Audio Translations via Direct Preference Optimization troduction of Preference Optimization: Tango 2 utilizes a novel approach in the realm of text-to-audio generation by employing...

    Imagine Colorization: Image Colorization with AI

    A Novel Framework for Interactive and Editable AI-Driven Colorization Innovative Imagination Module: The core feature of the Imagine Colorization framework is its ability to generate...

    Exploring 3D Awareness in Visual Foundation Models: A New Study by Google

    Probing the Depth and Multiview Consistency of AI-Driven Visual Perception 3D Structural Encoding: The study investigates whether visual foundation models not only manage 2D object...