GPT-5.4 Revolution: The Dawn of the Ultimate AI Colleague

March 6, 2026

Blending elite reasoning, native computer use, and unmatched coding to redefine professional workflows.

The Ultimate Knowledge Worker: GPT-5.4 merges top-tier reasoning with industry-leading coding (from GPT-5.3-Codex), setting new state-of-the-art benchmarks across 44 professional occupations and excelling in spreadsheets, presentations, and legal analysis.

Native Computer Use & Vision: For the first time, developers can build agents that natively navigate desktop environments, click, type, and perceive full-fidelity images up to 10.24M pixels, actively surpassing human baselines in OS navigation tasks.

Unprecedented Steerability and Efficiency: Introducing a 1M token context window, dynamic “tool search” to slash token bloat, and mid-response course correction, allowing users to guide the model’s “thinking” in real-time before it finalizes an output.

The landscape of artificial intelligence is shifting rapidly from chatbots that merely converse to autonomous agents that actually do. Today marks a massive leap forward in that transition with the launch of GPT-5.4—a model purpose-built for the rigors of professional work. Rolling out across ChatGPT (as GPT-5.4 Thinking), the API, and Codex, this highly efficient frontier model is designed to get complex, multi-step work done with less back-and-forth.

For those pushing the absolute limits of computational problem-solving, OpenAI is also debuting GPT-5.4 Pro, an uncompromised powerhouse available in ChatGPT and the API for the most demanding, complex tasks.

By combining the latest advances in general reasoning, coding, and agentic workflows, GPT-5.4 isn’t just a smarter model; it’s a fundamentally more capable digital employee.

Elevating Professional Knowledge Work

GPT-5.4 builds upon the foundation of GPT-5.2 but delivers significantly more polished, consistent results for real-world business applications. On the GDPval benchmark—which tests across 44 distinct occupations—GPT-5.4 matches or beats industry professionals in 83.0% of comparisons (up from 70.9% with GPT-5.2).

The model has been fine-tuned for the software professionals use daily. On internal spreadsheet modeling tasks akin to junior investment banking work, GPT-5.4 scored an impressive 87.3%. It also generates visually stunning, highly varied presentations that human raters preferred 68.0% of the time over its predecessor. To seamlessly integrate this into your workflow, Enterprise customers can now leverage the newly launched ChatGPT for Excel add-in.

Crucially, GPT-5.4 is the most factual model yet. In rigorous testing, its individual claims were 33% less likely to be false, and complete responses were 18% less likely to contain errors, making it a highly reliable partner for document-heavy sectors like law and finance.

Breaking the Screen Barrier: Native Computer Use

Perhaps the most groundbreaking addition to GPT-5.4 is its native computer-use capabilities. It is the first general-purpose model engineered to operate computers exactly like a human would—issuing mouse and keyboard commands in response to what it “sees” on screen.

The benchmark results are staggering:

OSWorld-Verified: Navigating desktop environments via screenshots, GPT-5.4 hit a 75.0% success rate, crushing GPT-5.2 (47.3%) and even surpassing the human baseline of 72.4%.
Online-Mind2Web: It achieved a 92.8% success rate using only screenshot-based observations.

This is powered by dramatically enhanced visual perception. The new original image input detail level supports full-fidelity perception up to a massive 10.24M pixels, allowing the model to parse incredibly dense, high-resolution documents and interfaces without losing the finer details.

Faster Coding and Dynamic Tool Search

Developers will find GPT-5.4 to be a revelation. It absorbs all the strengths of GPT-5.3-Codex but operates with lower latency across reasoning efforts. In Codex, a new /fast mode delivers up to 1.5x faster token velocity, allowing engineers to stay in their flow state while debugging complex frontend tasks. A new experimental skill, “Playwright (Interactive),” even allows Codex to visually debug web apps as it builds them.

To make agentic workflows viable at scale, GPT-5.4 introduces tool search in the API. Previously, passing thousands of tool definitions to a model bloated the context window and skyrocketed costs. Now, GPT-5.4 receives a lightweight directory and actively “looks up” only the tools it needs. In tests using Scale’s MCP Atlas benchmark, this reduced total token usage by 47% with zero loss in accuracy.

Coupled with a massive 1M token context window in Codex (experimental), agents can now plan, execute, and verify complex tasks over incredibly long horizons.

Steerability: Course-Correcting Mid-Thought

We’ve all experienced the frustration of watching an AI generate a long response, only to realize in the first few seconds that it misunderstood the prompt. GPT-5.4 Thinking in ChatGPT solves this.

For complex queries, the model now provides an upfront preamble of its thought process. You can review this plan and adjust its direction mid-response. It also features greatly improved web research capabilities, expertly navigating multiple rounds of searching to find “needle-in-a-haystack” information without losing the thread of your original request.

Safety and Availability

Because GPT-5.4 can natively operate systems and write high-level code, it is treated as a High cyber capability under the Preparedness Framework. Alongside robust safety monitoring and Zero Data Retention (ZDR) protections, the model exhibits low “Chain-of-Thought (CoT) controllability”—meaning it cannot easily obfuscate or hide its reasoning to evade safety monitors.

How to get it:

ChatGPT: GPT-5.4 Thinking replaces 5.2 Thinking today for Plus, Team, and Pro users. (GPT-5.2 will be retired on June 5, 2026). GPT-5.4 Pro is available for Pro and Enterprise plans.
API: Available now as gpt-5.4 and gpt-5.4-pro. While priced higher per token than 5.2 due to its capabilities, its sheer token efficiency often results in lower overall task costs.
Codex: Rolling out today with experimental 1M context support.

GPT-5.4 represents a paradigm shift. It is no longer just a tool you query; it is an agent you deploy.

Source