HomeAI Papers

AI Papers

ShowUI from Microsoft: GUI Interaction with Vision-Language-Action AI

A breakthrough in digital workflow assistants, bridging human-like perception and action for seamless GUI navigation. Enhanced Human-Like Interaction: ShowUI introduces a novel vision-language-action model, enabling more...

OmniControl: A Leap in Image-Conditioned Diffusion Transformers

Streamlined, scalable, and precise—OmniControl reshapes how we generate and control images using AI. OmniControl introduces an efficient framework for image-conditioned control in diffusion models, requiring...

From Image to 3D in Seconds: Adobe’s DiffusionGS Model

Adobe introduces DiffusionGS, a breakthrough in fast and scalable image-to-3D creation. Adobe unveils DiffusionGS, a cutting-edge 3D diffusion model, generating consistent 3D outputs from single 2D...

AIMV2: Apple’s Multimodal Revolution in Vision Encoding

Redefining AI with scalable pre-training for images and text integration. Apple introduces AIMV2, a family of large-scale vision encoders excelling in multimodal tasks. AIMV2 leverages autoregressive...

Alibaba’s Marco-o1: Pioneering Open-Ended Reasoning in AI

With advanced techniques like Chain-of-Thought and Monte Carlo Tree Search, Marco-o1 sets a new standard for tackling complex, ambiguous challenges. Beyond the Metrics: Marco-o1 addresses the...

FlipSketch: Breathing Life Into Your Doodles

Sketch animation with AI-powered simplicity and creativity. Effortless Animation: FlipSketch transforms static sketches into smooth animations with just a drawing and a text description. AI Innovation: Combines text-to-video...

RedPajama: The Future of Transparent and Open-Source Language Model Training

How RedPajama datasets are redefining AI development with transparency, scalability, and versatility. Transparency in AI Training: RedPajama introduces an unprecedented level of openness in dataset composition,...

AnimateAnything: Transforming Video Creation with Seamless Control and Precision

The groundbreaking framework for consistent, customizable video generation opens new doors for filmmakers and VR designers. Versatile Control: AnimateAnything enables precise video manipulation through camera trajectories,...

SAMPart3D: A Breakthrough in Zero-Shot 3D Object Segmentation for Complex Models

Achieving scalable and flexible part-level segmentation without text prompts, SAMPart3D enables advanced 3D editing and model customization. Text-Free, Scalable Segmentation: SAMPart3D removes the need for...

OMNI-EDIT: The Ultimate Image Editor with Multi-Task Capabilities for Any Aspect Ratio

OMNI-EDIT leverages specialist guidance to tackle seven unique editing tasks, achieving unprecedented accuracy and quality in real-world image editing. Multi-Task Capability: OMNI-EDIT is designed to...

Introducing StdGEN: Game-Changing 3D Character Generation from Single Images with Full Semantic Control

StdGEN offers an advanced pipeline for high-quality, semantically decomposed 3D characters ready for gaming, VR, and film production. Fast, High-Quality 3D Generation: StdGEN creates intricately...

TIP-I2V: The World’s Largest Dataset for Image-to-Video AI Research

A Million-Scale Dataset Brings New Potential to Image-to-Video Generation Models Unprecedented Scale and Scope: TIP-I2V introduces over 1.7 million unique text and image prompts for...