Nvidia Unveils Fugatto: AI Model That Redefines Audio Creativity

November 26, 2024

From barking trumpets to multilingual voiceovers, Fugatto redefines what’s possible in music, gaming, and sound design.

Nvidia’s Fugatto can generate and transform audio, from creating novel sounds to modifying voices and moods.
The model enables unprecedented creativity, supporting music production, voice editing, and dynamic game soundtracks.
While Fugatto offers transformative potential, Nvidia is cautious about public release due to ethical considerations.

Nvidia, a leader in AI innovation, has unveiled Fugatto, a groundbreaking generative AI model that redefines audio creation and transformation. Short for Foundational Generative Audio Transformer Opus 1, Fugatto empowers creators to generate entirely new sounds, modify voices, and seamlessly blend audio elements—all with a simple text prompt.

From producing music snippets to transforming piano melodies into vocal lines, Fugatto stands out as a versatile tool for industries like music, gaming, and advertising. Unlike traditional AI models limited to predefined tasks, Fugatto excels at combining creative freedom with technical precision, offering possibilities as varied as generating multilingual voiceovers or crafting entirely new soundscapes.

What Makes Fugatto Special?

Fugatto’s ability to transform and combine audio inputs sets it apart. Music producers can prototype songs with different styles and instruments, while game developers can create dynamic soundtracks that evolve with gameplay. For instance, it can make a trumpet bark like a dog or convert a spoken phrase into a song, complete with emotional inflections.

The model also introduces ComposableART, which allows users to blend multiple instructions during audio creation. For example, a user can request a voiceover with a French accent delivered in a melancholic tone, with fine-grained control over each attribute. This level of customization enables artists and technologists to experiment in ways previously unimaginable.

A Powerful Tool with Creative Controls

Fugatto doesn’t just replicate existing audio—it generates original soundscapes. Temporal interpolation allows users to create audio that changes over time, such as a thunderstorm fading into birdsong at dawn. This dynamic feature adds depth to music production and immersive storytelling.

What’s more, the model’s ability to interpolate between prompts lets users fine-tune their creations, from emphasizing an accent to altering the mood of a piece. This functionality bridges the gap between technical expertise and artistic intuition, making Fugatto an accessible tool for both professionals and hobbyists.

Behind the Scenes of Fugatto’s Development

Building Fugatto was no small feat. The model, which boasts 2.5 billion parameters, was trained on Nvidia’s cutting-edge DGX systems, leveraging 32 H100 Tensor Core GPUs. A global team contributed to its development, ensuring its capabilities in handling diverse languages and accents.

Creating the dataset was one of the project’s biggest challenges. The team synthesized millions of audio samples to train the model, significantly expanding its task range. Early demos, such as Fugatto generating electronic music with barking dogs in time to the beat, left the team—and their audience—in awe.

A New Era of Sound Design

Fugatto represents a leap forward in AI-driven audio technology, offering endless possibilities for creators. From enhancing music production to revolutionizing gaming soundtracks, it stands as a testament to the transformative potential of generative AI.

While Nvidia remains cautious about releasing Fugatto publicly, the model hints at a future where AI becomes an indispensable collaborator in creative industries. As Nvidia’s Bryan Catanzaro aptly put it, “Generative AI is going to bring new capabilities to music, video games, and to ordinary folks that want to create things.”

Website

Source

Paper

From barking trumpets to multilingual voiceovers, Fugatto redefines what’s possible in music, gaming, and sound design.

What Makes Fugatto Special?

A Powerful Tool with Creative Controls

Behind the Scenes of Fugatto’s Development

A New Era of Sound Design

RELATED ARTICLES

Must Read