HomeAI Papers

AI Papers

The Future of Film is Fake (But You Won’t Know It): Meet InsertAnywhere

How a new AI framework is bridging the gap between 4D geometry and realistic video editing. The Challenge: Inserting objects into video (VOI) has historically failed...

The Hidden Loop: How Vision Transformers Are Secretly Recurrent Systems

Unveiling the "Block-Recurrent Hypothesis" and the emergence of dynamical simplicity in deep learning. The Block-Recurrent Hypothesis (BRH): Deep Vision Transformers (ViTs) often operate like recurrent systems,...

Beyond the Attention Matrix: Unlocking Sequence Modeling with Grassmann Flows

Why geometric evolution on manifolds might be the linear-complexity, interpretable alternative to the Transformer's quadratic dominance. Challenging the Status Quo: The article questions the assumption that...

GLM-4.7: The Intelligent Evolution of Your Coding Partner

Mastering complex agents, visual design, and adaptive reasoning with a new level of control. Agentic Powerhouse: GLM-4.7 delivers massive performance leaps in agentic coding and terminal...

ReCo: Precision Editing for the Next Generation of AI Video

Mastering the balance between creative transformation and background stability through Region-Constraint In-Context Generation. Solving the Stability Crisis: ReCo addresses the critical flaw in AI video editing...

The New Heavyweight in Formal Math: How Seed-Prover 1.5 Bridges the Gap

Mastering undergraduate and graduate-level theorems through experience-based learning and efficient scaling. Unmatched Efficiency and Accuracy: Seed-Prover 1.5 outperforms state-of-the-art models with a fraction of the compute...

Infinite Worlds, Instant Feedback: The Leap Forward in Real-Time AI World Modeling

Introducing WorldPlay: The streaming diffusion model that finally balances 24 FPS speed with long-term geometric memory. Breaking the Trade-off: WorldPlay solves the persistent conflict between real-time...

The Blueprint of Life: MIT’s New AI Predicts Embryo Development Minute by Minute

A breakthrough deep-learning model tracks 5,000 fruit fly cells with 90% accuracy, paving the way for early disease detection in human tissues. A "Dual-Graph" Innovation: MIT...

ByteDance’s Dolphin-v2 Revolutionizes Document Parsing

A new 3B parameter model uses a novel "analyze-then-parse" approach to master complex layouts with pixel-level precision. Universal Understanding: Dolphin-v2 is a lightweight (3B parameter) model...

Through Their Eyes: How EgoX Turns Third-Person Video into Immersive First-Person Reality

Unlocking the power of immersive storytelling, robotics, and AR by synthesizing realistic egocentric perspectives from standard footage. Immersive Transformation: EgoX is a groundbreaking framework that generates...

The Creativity Paradox: Unlocking AI Diversity by Bypassing Human Bias

How a simple prompting trick called Verbalized Sampling overcomes the "Typicality Bias" that makes LLMs predictable. The Root Cause: Research identifies "Typicality Bias"—a cognitive psychological tendency...

Beyond Perception: GLM-4.6V Bridges the Gap Between Visual Understanding and Executable Action

Introducing a new era of open-source multimodal models featuring native tool use, massive context windows, and real-world agentic capabilities. A Dual-Model Release: The GLM-4.6V series launches...