Trim 31GB AI Models To 13GB With FP16-FP8-to-NVFP4

    
        By vramkickedin    
     | 
    
            May 15, 2026 at 9:04 pm        
    
     | 
    
        2 min read

The FP16-FP8-to-NVFP4 tool by developer Thenotrealuser is a Windows-based converter that turns FP16 or BF16 diffusion model files into NVFP4 format for Blackwell GPUs. It targets popular image generation models like Z-Image Turbo and HiDream-I1, dramatically reducing their file size while keeping them fully usable inside ComfyUI. Built to solve the lack of straightforward, consumer-friendly NVFP4 tools, the project focuses on real safetensors workflows instead of datacenter pipelines.

Thenotrealuser focused on creating a GUI and CLI tool that runs on Windows, automatically detects model architectures, and applies mixed-precision conversion to avoid breaking sensitive layers. The converter profiles each model type, safely handling HiDream’s Mixture-of-Experts gates that must stay in higher precision. It gives users a practical utility for shrinking models on RTX 50 cards without wrestling with TensorRT or Linux.

Desktop-friendly quantization for RTX 50 owners

Key Features

GUI interface with real-time progress logging.
CLI support for scripting and batch use.
Dry scan detects FP8 and unsafe tensors.
Automatic architecture detection and profiling.
Safe MoE handling prevents HiDream crashes.
Per-model conversion profiles for best results.
Mixed precision output preserves model stability.
ComfyUI-ready safetensors with quantization metadata.

This converter is meant for ComfyUI users with an RTX 50 series GPU who struggle to fit large models in VRAM. It converts a 31.8 GB HiDream-I1 model to about 13 GB, making it usable on a 16 GB RTX 5060 Ti. Small agencies and privacy-conscious professionals benefit from running cutting-edge diffusion models locally without depending on cloud services.

Developer notes and current limitations

thenotrealuser advises that full NVFP4 conversion of all layers is possible but not recommended because some tensors are too sensitive and may cause crashes. Using FP8 source models is experimental and can yield unstable results due to leftover quantization metadata from previous conversions. Because NVFP4 support is still maturing in PyTorch and ComfyUI, the tool is considered experimental, and users should always keep backups of original models.