Diff-forge Carves Flawless Training Datasets From Your Video Footage

    
        By vramkickedin    
     | 
    
            May 11, 2026 at 8:03 pm        
    
     | 
    
        2 min read

Diff-forge is a new open-source tool that automates the tedious work of preparing video datasets for diffusion model fine-tuning. It runs entirely on your own machine, providing a visual browser-based editor to ingest, validate, transform, caption, and export training clips. The tool ensures every file meets the strict frame-count and resolution rules that models like LTX Video and WAN require.

The project was built by Oqura-ai over a weekend after the team grew frustrated with repetitive preprocessing for their own animation training experiments. They open-sourced it under the MIT license so that anyone can benefit from a fully local pipeline that never shares data externally. Now creators can simply drop a folder of raw clips into diff-forge and get back a clean, captioned ZIP file ready for training scripts.

Model-aware validation and transforms

Key Features

Smart ingest scans folders and pairs caption files.
Real-time validation against LTX or WAN frame rules.
Bulk resolution and frame normalization with preview.
Five resize modes for precise aspect-ratio control.
AI captioning via OpenAI, Gemini, or Azure.
Per-item editor with frame slicer and grid view.
Export training-ready ZIP with optional trigger word.
Undo/redo actions with up to 50-step history.

This tool is ideal for serious hobbyists and small studios who train their own diffusion models on local GPUs. Privacy-focused professionals will appreciate that all processing stays on their own hardware, with no data ever leaving the machine. It also suits content creators who need a repeatable, efficient way to prepare consistent video datasets without wrestling with ffmpeg commands.

Developer notes and future plans

Oqura-ai built the tool over a weekend after encountering the same tedious dataset hurdles while training animation models. Currently diff-forge fully supports LTX Video transforms and export, though WAN model support is limited to validation and configuration, with a processor planned. The developer emphasized that the processor system is pluggable and adding a new model takes roughly 50 lines of code, signaling broader model compatibility in future updates.