The First Open-Source, Native 1-Bit LLM at Scale
BitNet b1.58 2B4T is the first open-source, native 1-bit Large Language Model (LLM) at the 2-billion parameter...
Harnessing Video Diffusion for Temporally Consistent Surface Normals
NormalCrafter introduces a groundbreaking approach to surface normal estimation in videos, leveraging video diffusion priors to ensure...
Human-AI Interaction for Mental Health Safety
The rise of LLM-driven AI characters, like those on platforms such as Character.AI, has created new opportunities for emotional...
Balancing Identity Preservation and Personalized Editing in 2D Generative Models
FlexIP introduces a groundbreaking framework that decouples identity preservation and stylistic manipulation in 2D image...
OpenAI Unleashes GPT-4.1: A New Era of Multimodal AI Power
OpenAI has launched GPT-4.1, a cutting-edge multimodal AI model surpassing GPT-4o with a massive one-million-token...
Smaller Models, Smarter Features, and a Race Against Capacity Challenges
OpenAI is set to launch GPT-4.1, an upgraded version of its flagship GPT-4o model, alongside smaller variants like GPT-4.1...
Unleashing the Power of Coherent Motion Synthesis in Avatar Animation
FantasyTalking introduces a novel framework that leverages a pretrained video diffusion transformer model to generate...
James Cameron's Vision for a Faster, More Creative Film Industry
James Cameron believes AI can save big-budget movies by making visual effects cheaper and faster,...
A New Benchmark Tests AI’s Ability to Answer Real-Time Visual Questions
Introducing LIVEVQA – A groundbreaking dataset of 3,602 visual questions sourced from live news, designed...