IMG-Dataset-Refiner Scrubs Image Folders Into Perfect AI Training Data

IMG-Dataset-Refiner version 4.3 is a free local application that turns messy image folders into clean, training-ready datasets. It provides a visual workspace for editing captions, removing duplicates, and balancing image sets used in LoRA and diffusion model training. The tool runs entirely on your own computer and can connect to local AI servers for automated caption cleanup.
Developer NyxAwroo released this update to give creators full control over dataset preparation without cloud dependencies. Many trainers face sloppy folders and inconsistent captions that hurt model performance. This release solves those problems by combining manual editing, batch actions, and AI assistance into a single Gradio interface.
Visual dataset management and AI cleaning
- Load image folders and .txt captions quickly.
- Edit captions with live word and token counts.
- Batch-clean comma spacing and duplicate tags.
- Detect near-duplicate images with perceptual hashing.
- Use local AI models to generate or fix captions.
- Analyze tag frequency, co-occurrence and bias.
- Export balanced subsets with greedy algorithms.
- Persistent settings for repeatable workflows.
Creators training character or style LoRAs on personal GPUs will find this tool especially helpful. Small teams can standardize caption quality across multiple projects without manual spreadsheet work. Privacy-conscious users benefit from all processing staying local—no images are ever uploaded to outside services.
Developer notes and stability guidance
The app depends on a delicate bridge between Gradio and custom JavaScript, so the developer includes strict coding rules to avoid crashes. For instance, contributors must not pass custom_js through launch() and should avoid updating the Gallery component from app.load to prevent browser freezes. NyxAwroo also provides a dedicated prompt file, Prompt_system.md, to help future developers follow these stability constraints.
“This project relies on a sensitive Gradio + JavaScript bridge.” — Source: GitHub