Ming UniVision: AI Images Transformed

    
        By vramkickedin    
     | 
    
            October 11, 2025 at 3:35 pm        
    
     | 
    
        2 min read

# Ming UniVision: Unified Image AI with Continuous Visual Tokenization

inclusionAI has unveiled Ming UniVision, a groundbreaking multimodal large language model (MLLM) that unifies image understanding, generation, and editing within a single continuous autoregressive framework.

Project Foundation

On October 7, 2025, inclusionAI released Ming UniVision, introducing MingTok — the first continuous visual tokenizer designed to seamlessly support both image understanding and generation. The project aims to eliminate traditional quantization challenges by creating a unified, continuous latent space for visual representation.

Core Technical Innovation

Ming UniVision introduces three key technical breakthroughs:

First continuous unified tokenizer for vision tasks
Next-token prediction framework across understanding and generation
3.5× faster training convergence compared to existing models

Technical Architecture

The model's architecture features a three-stage approach:

Low-level Encoder: Converts images into compact, continuous latent codes
Semantic Decoder: Transforms latent codes into high-dimensional semantic features
Pixel Decoder: Ensures high-fidelity image reconstruction

Performance Benchmarks

In comprehensive evaluations, Ming UniVision demonstrated competitive performance:

Multimodal Understanding: Achieved 78.5 on MMBench
Visual Generation: Scored 0.85 overall on GenEval benchmark
Image Reconstruction: Reached 31.09 PSNR with low rFID of 0.38

Key Capabilities

The model supports advanced multimodal interactions including:

Iterative image enhancement
Seamless understanding and generation
Multi-round in-context editing
Direct feature-to-feature transformations

Learn More about Ming UniVision

Explore the project details on their Project Page.
The full technical paper is available Here.
Source code can be found on their GitHub Repository.

More Image Related News

A vintage instant camera is made up of a polished matte ceramic design.

Ming UniVision: AI Images Transformed

Project Foundation

Core Technical Innovation

Technical Architecture

Performance Benchmarks

Key Capabilities

Learn More about Ming UniVision

More Image Related News

Boogu Unveils Boogu-Image-0.1-Edit To Easily Transform Your Photos

Krea Ignites Krea 2 Turbo To Turn Text Into Pictures Fast

Krea Introduces Krea 2 Raw For Developer Image Training

Danrisi Upgrades UltraReal_FineTune_Anima_base1_v3 For Real AI Photos