Diff-forge Carves Flawless Training Datasets From Your Video Footage

A massive dark iron anvil-like block engraved with the word diff-forge in sleek embossed letters.

Diff-forge is a new open-source tool that automates the tedious work of preparing video datasets for diffusion model fine-tuning. It runs entirely on your own machine, providing a visual browser-based editor to ingest, validate, transform, caption, and export training clips. The tool ensures every file meets the strict frame-count and resolution rules that models like LTX Video and WAN require.

The project was built by Oqura-ai over a weekend after the team grew frustrated with repetitive preprocessing for their own animation training experiments. They open-sourced it under the MIT license so that anyone can benefit from a fully local pipeline that never shares data externally. Now creators can simply drop a folder of raw clips into diff-forge and get back a clean, captioned ZIP file ready for training scripts.

Model-aware validation and transforms

Key Features
  • Smart ingest scans folders and pairs caption files.
  • Real-time validation against LTX or WAN frame rules.
  • Bulk resolution and frame normalization with preview.
  • Five resize modes for precise aspect-ratio control.
  • AI captioning via OpenAI, Gemini, or Azure.
  • Per-item editor with frame slicer and grid view.
  • Export training-ready ZIP with optional trigger word.
  • Undo/redo actions with up to 50-step history.

This tool is ideal for serious hobbyists and small studios who train their own diffusion models on local GPUs. Privacy-focused professionals will appreciate that all processing stays on their own hardware, with no data ever leaving the machine. It also suits content creators who need a repeatable, efficient way to prepare consistent video datasets without wrestling with ffmpeg commands.

Developer notes and future plans

Oqura-ai built the tool over a weekend after encountering the same tedious dataset hurdles while training animation models. Currently diff-forge fully supports LTX Video transforms and export, though WAN model support is limited to validation and configuration, with a processor planned. The developer emphasized that the processor system is pluggable and adding a new model takes roughly 50 lines of code, signaling broader model compatibility in future updates.

"Realized the worst part is not training, it’s the dataset prep." — Source: Reddit