More
    HomeAI Papers

    AI Papers

    Sony’s Visual Echoes: A Unified Transformer for Audio-Visual Generation

    Exploring a Lightweight Approach to Bridging Visual and Audio Generation Unified Transformer Model: Visual Echoes uses a simple generative transformer for both audio-visual generation and...

    OmniGlue from Google: Enhancing Image Matching with Foundation Model Guidance

    New AI Technique Promises Better Cross-Domain Generalization in Image Matching Foundation Model Guidance: OmniGlue uses a vision foundation model to improve feature matching across different...

    Grounding DINO 1.5 Advances Open-Set Object Detection

    IDEA Research Introduces High-Performance and Efficient Models for Enhanced Object Detection Two Advanced Models: Grounding DINO 1.5 Pro and Grounding DINO 1.5 Edge offer high-performance...

    CAT3D Revolutionizes 3D Content Creation with Multi-View Diffusion Models

    New Method Generates 3D Scenes Quickly and Efficiently from Minimal Inputs Efficient 3D Generation: CAT3D uses multi-view diffusion models to generate consistent 3D scenes from...

    Understanding Transformer Performance with Associative Memory

    Huawei's Framework Offers New Insights Beyond Traditional Scaling Laws Associative Memory Modeling: Transformers are modeled using associative memories, explaining the attention mechanism through Hopfield networks. Energy...

    Coin3D Introduces Controllable and Interactive 3D Assets Generation

    New Framework Allows Users to Control and Edit 3D Models with Ease Interactive Generation Workflow: Coin3D enables users to control 3D generation using coarse geometry...

    IBM Large Language Models as Planning Domain Generators

    Automating AI Planning with LLMs: Exploring the Potential and Future Directions Framework for Evaluation: Introducing an automated evaluation framework for LLM-generated planning domains. Empirical Analysis: Analysis...

    Automated Logo Animation with Adobe’s LogoMotion

    A Closer Look at Visually Grounded Code Generation for Dynamic Brand Representations Content-Aware Animation: LogoMotion utilizes large language models (LLMs) to generate animation code specifically...

    Apple Presenting ‘Automatic Creative Selection’ for Enhanced App Discoverability

    Enhancing App Searchability Through Advanced Image-Text Matching Novel Matching Approach: Apple introduces a new fine-tuning approach for pre-trained cross-modal models, significantly enhancing the matching of...

    InstantFamily: A Leap in Multi-ID Image Synthesis

    Enhancing Zero-shot Personalized Image Generation with Masked Cross-Attention Innovative Masked Cross-Attention Mechanism: InstantFamily introduces a novel masked cross-attention mechanism that integrates with a multimodal embedding...

    Google introduced Object Tracking: STT Integrates Transformers in Autonomous Driving”Google introduced

    Enhancing Safety and Precision in Autonomous Vehicles through Advanced Stateful Tracking Technology Unified Model for Tracking and State Estimation: The newly introduced STT model employs...

    LEGENT: Embodied Agents with Open-Source AI Platform

    Enhancing Real-World Applications Through Advanced Language and Multimodal Models Integration Comprehensive Development Environment: LEGENT provides a robust platform combining a 3D interactive environment with a...