SyntheticGen Crafts Balanced Data for Smarter Satellite AI

Yellow satellite floating over a grey digital landscape

SyntheticGen is a new open-source tool that creates synthetic training data for remote sensing segmentation tasks. It allows users to generate images with explicit control over class distributions, addressing the common problem of imbalanced datasets.

Developed by buddhi19, this framework tackles the challenge of training AI models on rare or underrepresented land cover classes. The tool uses a two-stage pipeline to create realistic aerial imagery while maintaining precise control over what appears in each generated sample.

Controlled data generation pipeline

  • Two-stage generation process separating layout creation from image synthesis.
  • Ratio-conditioned discrete diffusion model for semantic layout generation.
  • ControlNet-guided Stable Diffusion for realistic remote-sensing imagery.
  • Full or sparse ratio control for specifying exact class proportions.
  • Config-first workflow for reproducible experiments.
  • Pre-generated synthetic datasets available for immediate use.

Researchers and data scientists working with satellite or aerial imagery can use this tool to strengthen minority classes in their training data. Geographic information system professionals dealing with domain shifts between urban and rural areas may also find value in generating targeted samples that fill gaps in existing datasets.

SyntheticGen's practical implementation details

The project provides a straightforward installation process through GitHub with Conda environment setup and PyTorch dependencies. Users can generate their first synthetic image using provided configuration files, with options to override parameters via command line for quick experimentation.

The development team emphasizes the separation between semantic control and visual realism as a key design principle.

'By separating semantic control from visual rendering, the framework achieves something highly valuable: it is both principled and practical,'

the developers note in their documentation.

Get SyntheticGen on GitHub. You can also access the Hugging Face page or read the full research paper.