ChiefNako Releases ComfyUI Blackwell Docker

ComfyUI and NVidia text logos on a digital styled wallpaper background

ChiefNako has released ComfyUI Blackwell Docker, a production-ready setup designed to leverage the NVIDIA Blackwell architecture (RTX 50 series) through NVFP4 4-bit quantization. This new Docker configuration delivers quantified performance gains, including 3x faster image generation compared to standard 16-bit models and 3.5x less VRAM usage.

Real-world testing on an RTX 5090 shows FLUX.1-dev generating images in approximately 12 seconds, a sharp decrease from the 40+ seconds typically required in BF16. The setup achieves this by reducing model size significantly; for instance, FLUX.1-dev shrinks to 6.77GB from 24GB while maintaining quality that is virtually identical to full precision. By running ComfyUI in a container, the solution also keeps the host system clean and supports persistent data storage for models and custom nodes.

Core features of the Blackwell Docker

  • 3x faster image generation speed.
  • 3.5x reduction in VRAM usage.
  • Native NVFP4 support.
  • Sandboxed Docker environment.
  • Persistent storage for models, outputs, custom nodes, and workflows.
  • Compatibility with RTX 30xx and RTX 40xx GPUs.

Developer notes on quantization

The optimization strategy hinges on the specific benefits of the Blackwell architecture's floating-point format. ChiefNako explains the significance of this technology, stating:

'This isn't your typical "lossy compression" - it's a hardware-accelerated precision format designed specifically for AI workloads.'

What this means is users can expect substantial memory reduction without the usual trade-offs associated with compression. The setup also integrates SageAttention for further acceleration, though the developer notes a specific limitation regarding text models:

'For the life of me I couldn't get nvfp4 text models working with CLIP loader- the way nvfp4 stuffs up the shape of the model hasn't been resolved yet in comfyui.'

Consequently, users may need to rely on fp8 or fp16 models for text encoding until this shape issue is resolved.

Implementation requirements & support

To deploy this release, specific hardware and software configurations are required to handle the NVFP4 workload. The system requires an NVIDIA GPU from the Blackwell series (RTX 5090, 5080, 5070) for full acceleration, though it functions on older Ampere and Ada cards without the speed boost. Minimum specifications include 16GB of VRAM—though 24GB is recommended—alongside 100GB of storage space. Software dependencies include Docker version 20.10 or newer and the NVIDIA Container Toolkit.

Learn more about ComfyUI Blackwell Docker

To get started visit ChiefNako's GitHub page here.