A breakthrough in digital workflow assistants, bridging human-like perception and action for seamless GUI navigation.
Enhanced Human-Like Interaction: ShowUI introduces a novel vision-language-action model, enabling more...
Streamlined, scalable, and precise—OmniControl reshapes how we generate and control images using AI.
OmniControl introduces an efficient framework for image-conditioned control in diffusion models, requiring...
Adobe introduces DiffusionGS, a breakthrough in fast and scalable image-to-3D creation.
Adobe unveils DiffusionGS, a cutting-edge 3D diffusion model, generating consistent 3D outputs from single 2D...
Redefining AI with scalable pre-training for images and text integration.
Apple introduces AIMV2, a family of large-scale vision encoders excelling in multimodal tasks.
AIMV2 leverages autoregressive...
With advanced techniques like Chain-of-Thought and Monte Carlo Tree Search, Marco-o1 sets a new standard for tackling complex, ambiguous challenges.
Beyond the Metrics: Marco-o1 addresses the...
Sketch animation with AI-powered simplicity and creativity.
Effortless Animation: FlipSketch transforms static sketches into smooth animations with just a drawing and a text description.
AI Innovation: Combines text-to-video...
How RedPajama datasets are redefining AI development with transparency, scalability, and versatility.
Transparency in AI Training: RedPajama introduces an unprecedented level of openness in dataset composition,...
The groundbreaking framework for consistent, customizable video generation opens new doors for filmmakers and VR designers.
Versatile Control: AnimateAnything enables precise video manipulation through camera trajectories,...
Achieving scalable and flexible part-level segmentation without text prompts, SAMPart3D enables advanced 3D editing and model customization.
Text-Free, Scalable Segmentation: SAMPart3D removes the need for...
OMNI-EDIT leverages specialist guidance to tackle seven unique editing tasks, achieving unprecedented accuracy and quality in real-world image editing.
Multi-Task Capability: OMNI-EDIT is designed to...
StdGEN offers an advanced pipeline for high-quality, semantically decomposed 3D characters ready for gaming, VR, and film production.
Fast, High-Quality 3D Generation: StdGEN creates intricately...
A Million-Scale Dataset Brings New Potential to Image-to-Video Generation Models
Unprecedented Scale and Scope: TIP-I2V introduces over 1.7 million unique text and image prompts for...