ComfyUI-DramaBox Injects Expressive Ai Speech Directly Into Visual Workflows

    
        By vramkickedin    
     | 
    
            May 25, 2026 at 2:24 pm        
    
     | 
    
        2 min read

ComfyUI-DramaBox is a new custom node pack that brings ResembleAI’s expressive text-to-speech system directly into ComfyUI workflows. It turns text prompts into spoken audio using the LTX-2.3 audio diffusion model, with support for voice reference clips and advanced generation settings. The add-on downloads all required model weights automatically on first use, so you can start creating speech without manual setup.

FranckyB who also created ComfyUI-Prompt-Manager and Voice Clone Studio App, developed this add-on as an open-source port of the DramaBox pipeline for the ComfyUI community. He released a first draft and quickly followed up with improvements like CPU offloading and a dedicated unload node based on early feedback. The project aims to give local AI users a reliable way to generate expressive, customizable voice content without leaving the visual tools they already use.

Dual generation modes and memory control

Key Features

Generate speech from text with optional voice clips.
Switch between native wrapper and ComfyUI-managed modes.
Load custom Gemma text encoders in safetensors or GGUF.
Adjust steps, CFG scale, duration, and memory policy.
Apply trained voice LoRAs for consistent vocal styles.
Offload models to CPU after each generation.
Release cached models on demand with unload node.
Automatic model download and old-file cleanup.

This tool suits creators who already build image and video pipelines in ComfyUI and want to add voiceovers or dialogue locally. It’s especially useful for privacy-conscious professionals because all processing stays on the user’s own machine. While a 16 GB GPU is recommended for smooth performance, the add-on’s memory offloading options let lower-VRAM systems handle generation too.

What the developer says

FranckyB notes that the clip_loader mode gives better ComfyUI-native memory control but may produce slightly different tone or pacing compared to the original DramaBox pipeline. He plans to integrate Audio Prompt Presets into his Prompt Manager add-on, making it easier to build repeatable speech workflows. The project automatically removes old snapshot files and downloads the default 8 GB Gemma encoder if missing, keeping setup lean.