More
    HomeAI Papers

    AI Papers

    IBM Large Language Models as Planning Domain Generators

    Automating AI Planning with LLMs: Exploring the Potential and Future Directions Framework for Evaluation: Introducing an automated evaluation framework for LLM-generated planning domains. Empirical Analysis: Analysis...

    Automated Logo Animation with Adobe’s LogoMotion

    A Closer Look at Visually Grounded Code Generation for Dynamic Brand Representations Content-Aware Animation: LogoMotion utilizes large language models (LLMs) to generate animation code specifically...

    Apple Presenting ‘Automatic Creative Selection’ for Enhanced App Discoverability

    Enhancing App Searchability Through Advanced Image-Text Matching Novel Matching Approach: Apple introduces a new fine-tuning approach for pre-trained cross-modal models, significantly enhancing the matching of...

    InstantFamily: A Leap in Multi-ID Image Synthesis

    Enhancing Zero-shot Personalized Image Generation with Masked Cross-Attention Innovative Masked Cross-Attention Mechanism: InstantFamily introduces a novel masked cross-attention mechanism that integrates with a multimodal embedding...

    Google introduced Object Tracking: STT Integrates Transformers in Autonomous Driving”Google introduced

    Enhancing Safety and Precision in Autonomous Vehicles through Advanced Stateful Tracking Technology Unified Model for Tracking and State Estimation: The newly introduced STT model employs...

    LEGENT: Embodied Agents with Open-Source AI Platform

    Enhancing Real-World Applications Through Advanced Language and Multimodal Models Integration Comprehensive Development Environment: LEGENT provides a robust platform combining a 3D interactive environment with a...

    Bridging the Gap: Advancements in Open-Source Multimodal AI Models

    InternVL 1.5 Challenges Proprietary Giants with Enhanced Multimodal Capabilities Enhanced Vision Encoder: InternVL 1.5 incorporates a robust vision foundation model, InternViT-6B, improved through continuous learning...

    Reframing Visual Creativity with Adobe: Editable Image Elements in Diffusion Models

    Enhancing User Control in Image Synthesis with Innovative Editing Capabilities Introduction of Editable Image Elements: This new approach allows for spatial editing of images using...

    Google’s Gecko Evaluation Revolutionizes Text-to-Image Analysis

    New Metrics, Nuanced Human Ratings, and Diverse Model Assessments Introduction of Gecko2K Benchmark: Google's new Gecko2K benchmark categorizes prompts into sub-skills, providing a granular assessment...

    OpenELM from Apple Unveils: A Leap Forward for Open-Source Language Models

    Expanding Access and Enhancing Efficiency in AI Language Training Innovative Efficiency: OpenELM introduces a novel layer-wise scaling strategy in its transformer architecture, optimizing parameter allocation...

    in2IN Unveils Advanced AI for Generating Human Interactions

    A Leap Forward in Human Motion Generation with Enhanced Personalization Enhanced Individualization in Motion: in2IN introduces a novel diffusion model that conditions human-human motion generation...

    NVIDIA: AI Artistry with Advanced Diffusion Model Sampling Techniques

    Innovating Sampling Efficiency for Enhanced Visual Generation Innovative Sampling Optimization: Introducing 'Align Your Steps,' a novel approach that optimizes sampling schedules in diffusion models to...