Alibaba-PAI Launches Z Image Turbo Fun ControlNet Union 2.1

    
        By vramkickedin    
     | 
    
            January 26, 2026 at 5:02 pm        
    
     | 
    
        2 min read

# Z Image Turbo Fun ControlNet Union 2.1: Advanced Image Generation Update

Alibaba-PAI has released a significant update to their Z Image Turbo Fun ControlNet Union model, introducing version 2.1 with multiple performance and technical improvements. The latest release addresses previous model limitations and introduces enhanced image generation capabilities across various control conditions.

Key Model Enhancements

The updated model introduces several critical improvements:

Added a new lite model with Control Latents applied on 5 layers (only 1.9GB)
Resolved mask randomness and overfitting issues in previous control models
Restructured dataset with multi-resolution control images (512~1536)
Improved training schedules for better image generation consistency
Supports multiple control conditions including Canny, HED, Depth, Pose, and MLSD
Inpainting mode now fully supported

Technical Performance Optimization

During development, the team discovered and addressed significant performance challenges. 'During testing, we found that applying ControlNet to Z-Image-Turbo caused the model to lose its acceleration capability and become blurry,' the developers noted. To counter this, they performed 8-step distillation on the version 2.1 model, which demonstrated substantially improved performance.

Model Configuration Details

The new version includes multiple model variants:

Z-Image-Turbo-Fun-ControlNet-Union-2.1-2601-8steps: Enhanced mask diversity and training schedule
Z-Image-Turbo-Fun-ControlNet-Tile-2.1-2601-8steps: Improved resolution and training approach
Z-Image-Turbo-Fun-ControlNet-Union-2.1-lite-2601-8steps: Lighter model suitable for lower-spec machines

Training and Implementation Insights

The 2.0 model was trained on a comprehensive dataset of 1 million high-quality images, covering both general and human-centric content. The training was performed at 70,000 steps and at 1328 resolution using BFloat16 precision with 2.1 gaining an additional 11,000 steps.

Recommended Usage

Developers recommend using a control_context_scale between 0.65 and 0.90 for optimal results. 'For better stability, we highly recommend using a detailed prompt,' the documentation advises. The model supports 8-step inference and offers improved generation quality compared to previous versions.