HomeAI NewsAMD’s ROCm is Chipping Away at Nvidia’s Deepest Moat

AMD’s ROCm is Chipping Away at Nvidia’s Deepest Moat

With a unified software stack and a relentless focus on open-source community building, AMD’s AI software leadership is charting a course to dethrone CUDA—one step at a time.

  • From Fragments to a Unified Stack: Over the last two and a half years, AMD’s ROCm has evolved from a disjointed collection of firmware pieces into “OneROCm”—a cohesive, invisible platform with a rapid six-week release cadence designed to rival the seamlessness of Google Chrome.
  • The Open-Source Equalizer: The rise of open-source frameworks like OpenAI’s Triton has largely eliminated the once-daunting need to manually translate Nvidia CUDA code, allowing developers to write once and seamlessly run massive language models across AMD hardware.

A Grassroots Developer Revolution: Led by AI software VP Anush Elangovan, AMD is winning back developers through radical transparency—tackling thousands of GitHub complaints and addressing social media grievances one-on-one to pave the way for the highly anticipated MI450 GPUs.

Taking market share from Nvidia in the data center GPU space has long been viewed as the tech world’s ultimate uphill battle. It isn’t just about silicon; it’s about software. For years, Nvidia’s proprietary CUDA stack and its massive, entrenched installed base have acted as the widest, deepest moat protecting the most valuable company in the world. But for AMD, taking on that gargantuan task is no longer about finding a magic bullet—it’s about the climb.

“It’s like climbing a mountain—one step in front of another,” Anush Elangovan, AMD’s VP of AI software, told me in a recent exclusive interview. “Get your direction, lock in, and the rest will follow.”

Elangovan’s pragmatic optimism is hard-earned. He joined AMD two and a half years ago when the tech giant acquired his startup, Nod.ai. Before the acquisition, Nod.ai’s 30-person team had spent over half a decade building AI compilers and contributing heavily to vital AI repositories like Shark, Torch.MLIR, and IREE. They were the unsung heroes working under the hood for hyperscalers and enterprises. Today, that startup DNA is the driving force behind AMD’s most critical weapon: ROCm.

From a “Collection of Parts” to Google Chrome

The last time I sat down with AMD to discuss ROCm was just prior to the Nod.ai acquisition. Back then, Vamsi Boppana, AMD’s Senior VP of AI, made it clear that ROCm was the company’s absolute number one priority. The ambition was massive: unify the AI stack across all of AMD’s hardware, from CPUs and GPUs to FPGAs.

Reflecting on those days, Elangovan is refreshingly candid about where they started. “ROCm at that time was a collection of parts,” he admitted. “It grew up providing [firmware] to ASICs—like, here’s a firmware piece, here’s a firmware piece, let’s tie them together.”

Fast forward two and a half years, and that consistent investment has fundamentally transformed the platform. Internally dubbed “OneROCm,” the unification Boppana promised has materialized. While some elements remain hardware-specific, all acceleration now flows through a single ROCm stack, unlocking vital portability across AMD’s entire hardware ecosystem.

But Elangovan’s vision goes beyond mere functionality; he wants ROCm to disappear into the background. Drawing on his past experience working on the Google Chrome team, he noted, “If you’re a Chrome user, you probably don’t know what version you’re using—you don’t care because it just works. We’re already there with ROCm. In the next few releases, we’ll get to a six-week release cadence. We’ll get to a point where it just works, and it becomes invisible.”

Having successfully closed the initial deficit, AMD is now operating with the agility of a dedicated software firm. “We’re shipping software like a software company now,” Elangovan said, emphasizing that his team is already preparing for the next major industry shift: AI-assisted engineering.

Triton: The Great GPU Equalizer

A couple of years ago, the narrative surrounding AMD’s software was dominated by the headache of portability. Developers were constantly forced to convert existing Nvidia CUDA kernels to AMD’s HIP kernels. Today, however, the landscape has dramatically shifted as developers work much higher up the software stack.

Elangovan points to OpenAI’s open-source AI framework, Triton, as a massive catalyst for this change. “Back in the day, it was about converting CUDA kernels to HIP kernels,” he explained. “But increasingly, people went to Triton, which became the great equalizer of GPU programming. This great equalizer allowed you to write a Triton kernel and run it on AMD or Nvidia. And we invested heavily.”

That investment is tangible. One of Nod.ai’s key engineers now leads the Triton effort at AMD, working hand-in-hand with OpenAI. Furthermore, AMD’s continued maintenance of Torch.MLIR allows code to be easily retargeted across different hardware types.

The days of agonizing over CUDA conversions are fading. For modern inference customers—who are predominantly running massive Large Language Models (LLMs) via tools like vLLM or SGLang—the sole focus is achieving the highest number of tokens per second. If a brand-new attention algorithm drops, AMD’s Triton kernels act as an immediate catch-all, and the team will have a hyper-optimized version built within a day or two. As long as deployability remains identical, developers can simply pip install vLLM and get to work.

While AMD’s HIPify tool remains available for traditional High-Performance Computing (HPC) clients, Elangovan playfully noted that the modern workflow has evolved. To write and validate new AMD kernels, he now often relies on AI assistants. “Claude is better than HIPify because it has web search built in,” he quipped.

Open Source and the Front Lines of Social Media

Unlike its proprietary rival, ROCm is 100% open source, right up to the firmware level. While this exposes the software to intense scrutiny, it also allows ROCm to iterate at the blistering speed of the developer community rather than the corporate pace of AMD. “Everyone can tap in at whatever spot they want,” Elangovan noted.

To actively lure that community, AMD is pulling out all the stops. Crucially, ROCm now runs out-of-the-box on consumer laptops equipped with AMD Strix Halo processors. Whenever AMD pushes an update for its Instinct data center hardware, the Windows laptop version is released the exact same day, drastically lowering the barrier to entry for everyday developers.

But the real secret weapon in AMD’s software resurgence might just be Elangovan’s X (formerly Twitter) account. A self-described reluctant user, he joined the platform simply to provide developers with a ground-level view of AMD’s progress. “People started to follow, and that became one of my side jobs,” he laughed.

Elangovan actively monitors keywords like “ROCm sucks” or “AMD software not working,” and makes a point to respond to every single one, often providing anonymous developers with direct, personal support. This grassroots approach is yielding massive dividends. Last year, a GitHub poll soliciting ROCm complaints generated over 1,000 responses, heavily focused on a lack of support for older hardware. Today, a year later, Elangovan proudly states that every single one of those 1,000 complaints has been addressed.

“That has really changed the mood, from AMD developers being so annoyed that whatever driver wasn’t supported, to believing their efforts are appreciated,” he observed. It creates a multiplicative effect where developers finally feel heard, leading them to trust the ecosystem.

Building for the Next Decade

As we look toward the second half of 2026, Elangovan is “super-excited” for the impending launch of AMD’s MI450 hardware. But the software team’s ambitions are already extending far beyond simply achieving parity with Nvidia. They are actively exploring new, differentiated features unique to ROCm.

“We want ROCm to be a platform you can build on for the next 10 years,” he told me. “You shouldn’t have to worry about what happens when new hardware comes.”

It is a monumental task, but Elangovan is leveraging the resilience forged during his startup years—navigating the volatile ups and downs that eventually led to his compiler technologies powering almost every accelerator in the industry. For AMD, the roadmap to AI software supremacy is clear, pragmatic, and refreshingly devoid of hubris.

“We need to have conviction on our path,” Elangovan concluded. “And then it’s one step in front of the other.”

Helen
Helen
Lead editor at Neuronad covering AI, machine learning, and emerging tech.

Must Read