ComfyUI-DramaBox Injects Expressive Ai Speech Directly Into Visual Workflows

A featureless white human bust emerging from its slightly parted lips is a delicate translucent sound wave.

ComfyUI-DramaBox is a new custom node pack that brings ResembleAI’s expressive text-to-speech system directly into ComfyUI workflows. It turns text prompts into spoken audio using the LTX-2.3 audio diffusion model, with support for voice reference clips and advanced generation settings. The add-on downloads all required model weights automatically on first use, so you can start creating speech without manual setup.

FranckyB who also created ComfyUI-Prompt-Manager and Voice Clone Studio App, developed this add-on as an open-source port of the DramaBox pipeline for the ComfyUI community. He released a first draft and quickly followed up with improvements like CPU offloading and a dedicated unload node based on early feedback. The project aims to give local AI users a reliable way to generate expressive, customizable voice content without leaving the visual tools they already use.

Dual generation modes and memory control

Key Features
  • Generate speech from text with optional voice clips.
  • Switch between native wrapper and ComfyUI-managed modes.
  • Load custom Gemma text encoders in safetensors or GGUF.
  • Adjust steps, CFG scale, duration, and memory policy.
  • Apply trained voice LoRAs for consistent vocal styles.
  • Offload models to CPU after each generation.
  • Release cached models on demand with unload node.
  • Automatic model download and old-file cleanup.

This tool suits creators who already build image and video pipelines in ComfyUI and want to add voiceovers or dialogue locally. It’s especially useful for privacy-conscious professionals because all processing stays on the user’s own machine. While a 16 GB GPU is recommended for smooth performance, the add-on’s memory offloading options let lower-VRAM systems handle generation too.

What the developer says

FranckyB notes that the clip_loader mode gives better ComfyUI-native memory control but may produce slightly different tone or pacing compared to the original DramaBox pipeline. He plans to integrate Audio Prompt Presets into his Prompt Manager add-on, making it easier to build repeatable speech workflows. The project automatically removes old snapshot files and downloads the default 8 GB Gemma encoder if missing, keeping setup lean.

“It's very new, so if you encounter any bugs just let me know on GitHub.” — Source: Reddit