More
    HomeAI Papers

    AI Papers

    InternLM-XComposer-2.5 Expands the Boundaries of Vision-Language Models

    Unveiling New Possibilities in Text-Image Comprehension and Composition Enhanced Vision-Language Comprehension: IXC-2.5 supports ultra-high resolution and fine-grained video understanding, along with multi-turn multi-image dialogue. Extended Contextual...

    WE-MATH: Evaluating Human-like Mathematical Reasoning in Large Multimodal Models

    A Benchmark for Analyzing the Foundations of Visual Mathematical Reasoning Benchmark Introduction: WE-MATH is the first benchmark focused on the problem-solving principles behind LMMs' performance,...

    AI Unveils Evolutionary Patterns Predicted by Darwin and Wallace

    Machine Learning Sheds Light on Butterfly Evolutionary Diversity AI analyzed over 16,000 birdwing butterfly specimens, revealing evolutionary patterns in both sexes. Male butterflies showed more distinct...

    Mary Meeker Advocates for AI-Higher Education Partnership

    Collaboration between tech and universities is essential for U.S. to maintain AI leadership Mary Meeker emphasizes the importance of a partnership between AI and U.S....

    HuatuoGPT-Vision Injects Medical Visual Knowledge into Multimodal Models

    New dataset boosts medical capabilities of large language models PubMedVision dataset refines medical image-text pairs to enhance multimodal large language models (MLLMs). HuatuoGPT-Vision, trained on PubMedVision,...

    Bridging the Gap in AI: OMG-LLaVA’s Comprehensive Image and Text Reasoning Capabilities

    Integrating pixel-level understanding with powerful reasoning for advanced multimodal interactions Unified Model Architecture: OMG-LLaVA combines image-level, object-level, and pixel-level reasoning within a single framework, enhancing...

    YOUDREAM: Text-to-3D Animal Generation

    A breakthrough in 3D generation with text-to-image diffusion models YOUDREAM generates high-quality, anatomically controllable 3D animals using a text-to-image diffusion model guided by 2D views...

    Florence-2: Vision Tasks with Unified Representation

    Florence-2 integrates diverse vision and vision-language tasks through a novel prompt-based model. Florence-2 utilizes a unified, prompt-based approach for various vision and vision-language tasks. The model...

    Blood Test Could Predict Parkinson’s Seven Years Before Symptoms

    Early detection aims to revolutionize treatment and prevention Advanced Detection: A new blood test developed by researchers can predict Parkinson’s disease up to seven years before...

    Glyph-ByT5-v2 Sets New Standard for Multilingual Text Rendering

    The breakthrough in visual text rendering supports 10 languages with improved aesthetic quality Multilingual Capability: Glyph-ByT5-v2 and Glyph-SDXL-v2 accurately render text in 10 languages. Enhanced Aesthetics: The models...

    People Struggle to Differentiate Between Humans and AI in Five-Minute Chats

    The Surprising Accuracy of GPT-4 in Mimicking Human Conversation Confounding Conversations: In Turing test experiments, participants mistook GPT-4 for a human 54% of the time. Rising Concerns: The...

    ChatGPT’s Impact on the Freelance Market: A Looming Challenge

    How AI is Reshaping the Demand for Digital Freelancers Significant Decline: A 21% drop in demand for digital freelancers in writing and coding since ChatGPT's launch. Automation-Prone...