Diff-forge Carves Flawless Training Datasets From Your Video Footage

Diff-forge is a new open-source tool that automates the tedious work of preparing video datasets for diffusion model fine-tuning. It runs entirely on your own machine, providing a visual browser-based editor to ingest, validate, transform, caption, and export training clips. The tool ensures every file meets the strict frame-count and resolution rules that models like LTX Video and WAN require.
The project was built by Oqura-ai over a weekend after the team grew frustrated with repetitive preprocessing for their own animation training experiments. They open-sourced it under the MIT license so that anyone can benefit from a fully local pipeline that never shares data externally. Now creators can simply drop a folder of raw clips into diff-forge and get back a clean, captioned ZIP file ready for training scripts.
Model-aware validation and transforms
- Smart ingest scans folders and pairs caption files.
- Real-time validation against LTX or WAN frame rules.
- Bulk resolution and frame normalization with preview.
- Five resize modes for precise aspect-ratio control.
- AI captioning via OpenAI, Gemini, or Azure.
- Per-item editor with frame slicer and grid view.
- Export training-ready ZIP with optional trigger word.
- Undo/redo actions with up to 50-step history.
This tool is ideal for serious hobbyists and small studios who train their own diffusion models on local GPUs. Privacy-focused professionals will appreciate that all processing stays on their own hardware, with no data ever leaving the machine. It also suits content creators who need a repeatable, efficient way to prepare consistent video datasets without wrestling with ffmpeg commands.
Developer notes and future plans
Oqura-ai built the tool over a weekend after encountering the same tedious dataset hurdles while training animation models. Currently diff-forge fully supports LTX Video transforms and export, though WAN model support is limited to validation and configuration, with a processor planned. The developer emphasized that the processor system is pluggable and adding a new model takes roughly 50 lines of code, signaling broader model compatibility in future updates.
"Realized the worst part is not training, it’s the dataset prep." — Source: Reddit