Walkyrie-1.3B-v1.0 Spins Video Smarts Into Speedy Local Image Creation

Walkyrie-1.3B-v1.0 is a new text-to-image model that turns written prompts into 1024×1024 pixel images. It was rebuilt from an existing video-generation model after its language-understanding component was trimmed down to run faster on less powerful hardware. The developer retrained the entire pipeline specifically for still-image output instead of video clips.
The project comes from independent creator kpsss34, who adapted the Wan2.1-T2V-1.3B architecture. By pruning the UMT5 text encoder to around 1 billion parameters and fine-tuning for image generation, they created a tool that prioritizes quick, local use. The model is shared as an early preview to gather feedback and build community interest.
Early preview for community testing
- Text-to-image output at 1024×1024 resolution.
- Pruned text encoder for faster processing.
- Runs with as little as 6–8 GB VRAM.
- CPU offload support for memory efficiency.
- Built from Wan2.1-T2V-1.3B video architecture.
- Free for both research and commercial use.
- An early preview, trained to about 20% budget.
People with consumer-grade GPUs, small studios that rely on local AI tools, and anyone who wants to keep image generation private can benefit from this model. Because it can fit into mid-range hardware using CPU offload, it lowers the barrier for running high-quality diffusion locally. The current release is tuned toward an anime aesthetic, with a turbo variant and a larger 13B version planned for the future.
Developer notes and known limits
The model is very much a work in progress—only about one-fifth of the intended training has been completed, so quality and stability are expected to improve. The developer points out that anatomy problems remain a challenge, a limitation that often shows up in smaller models. Future releases, including a turbo edition and a bigger 13-billion-parameter version, depend on additional training resources and community support.
This model has only been trained to approximately 20% of the planned training budget.