More
    HomeAI Papers

    AI Papers

    How Real Is Your Real-Time Speech-to-Text Translation?

    Unveiling the Challenges and Pathways in Simultaneous Speech Translation Research Research Gaps Identified: Current Simultaneous Speech-to-Text Translation (SimulST) research overly focuses on pre-segmented speech, neglecting real-world...

    AniDoc: Transforming 2D Animation with AI-Powered Solutions

    How Generative AI is Simplifying 2D Animation Workflows AniDoc introduces AI-driven tools to streamline 2D animation processes, including coloring and in-betweening. The technology reduces labor costs...

    GenEx: Unlocking the Future of AI-Driven Exploration

    Harnessing Generative Imagination to Create and Navigate Immersive Worlds Revolutionizing AI Exploration: GenEx generates interactive, immersive 3D worlds from minimal input, empowering AI agents to explore...

    Meta: Moving Beyond Token Prediction to Semantic Understanding

    Meta’s Large Concept Models (LCMs): A New Paradigm in AI Language Modeling Semantic-Level Modeling: LCMs operate on high-level "concepts" rather than token-by-token processing, enabling a...

    ILLUME: Huawei’s Breakthrough in Multimodal AI for Unified Vision

    Efficient Training, Self-Enhancing Alignment, and Versatile Applications for the Next Generation of MLLMs Unified Multimodal Framework: ILLUME integrates understanding and generation capabilities through a next-token prediction...

    Faster, Sharper, and Smarter: Infinity Outpaces Diffusion Models in Quality and Speed

    Infinity: Redefining High-Resolution Text-to-Image Synthesis with Bitwise AutoRegressive Modeling Innovative Framework: Infinity introduces bitwise token modeling, infinite-vocabulary tokenization, and self-correction mechanisms to overcome traditional AutoRegressive model...

    Game On: DeepMind’s MAV Model Brings Grandmaster-Level AI to Chess and Beyond

    From Hallucination-Free Play to Grandmaster Elo Ratings, MAV Redefines AI Strategy and Planning Integrated Decision-Making: The Multi-Action-Value (MAV) model combines state tracking, planning, and action evaluation...

    SNOOPI: Setting a New Benchmark for One-Step Diffusion Models

    Introducing Dynamic Guidance and Negative Prompt Integration for Superior Image Generation Enhanced Stability: SNOOPI introduces Proper Guidance - SwiftBrush (PG-SB) to stabilize training by dynamically adjusting...

    Google’s Advanced AI Model Delivers Faster, More Accurate Weather Forecasts

    GenCast: Revolutionizing Weather Forecasting with AI Precision State-of-the-Art Forecasting: GenCast, Google’s new AI weather model, predicts weather conditions and risks with unprecedented accuracy up to 15...

    AnchorCrafter: Transforming Product Promotion with Human-Object Interactive Videos

    A New Era of Automation for Anchor-Style Advertising and Consumer Engagement Revolutionizing Product Promotion Videos: AnchorCrafter brings a new level of automation to anchor-style advertising by...

    CAT4D: Bringing Dynamic 3D Scenes to Life from Monocular Videos

    Revolutionizing 4D Scene Generation with Multi-View Video Diffusion Models Reimagining the World in 4D: CAT4D transforms standard monocular videos into dynamic 3D scenes, offering unprecedented realism...

    Breaking the Puzzle from Nvidia: LLM Efficiency for Real-World Applications

    How NVIDIA’s Puzzle Framework Redefines Language Model Optimization for Scalable AI Cost-Effective AI Scalability: NVIDIA’s Puzzle framework tackles the growing issue of high inference costs in...