One Still Becomes a Minute-Long 3D Video with SANA-WM Bidirectional

A new open-source model called SANA-WM Bidirectional can generate minute-long, 720p videos from a single starting image and a text prompt. It uses a 2.6B-parameter diffusion transformer to synthesize smooth footage while letting you control the camera’s position and angle in 3D space throughout the entire clip. The release also includes a special LTX-2 refiner that polishes the output for higher fidelity.
The Efficient-Large-Model team built this bidirectional checkpoint specifically for long-form world modeling that runs on consumer-class hardware. By combining an image-to-video transformer with a dedicated camera control branch, they made it practical to produce extended, high-resolution fly-through scenes without needing a cloud service. This directly addresses the challenge of creating consistent, minute-scale videos from a single frame while keeping the tool open-source.
Camera-aware video generation for local machines
- One-minute 720p video generation per run.
- Precise 3D camera trajectory control every frame.
- Hybrid attention for memory-efficient long contexts.
- Two-stage pipeline with an LTX-2 quality refiner.
- Automatic metric-scale camera poses from public videos.
- Runs offline once model files are downloaded.
- Apache 2.0 license for broad use.
This tool fits prosumer GPU owners, small creative agencies, and privacy-focused professionals who need extended video generation without relying on paid APIs. You can turn a single photo into a lengthy, camera-controlled animation for previz, concept work, or client presentations. Because everything happens on your own machine, sensitive imagery never leaves your control, and there are no per-generation fees.
What to expect under the hood
The full pipeline includes several sizable model files—the refiner alone is 41 GB—so a GPU with generous VRAM is recommended for smooth generation. The text encoder downloads automatically on first run, but you can point to local copies for fully offline use. The project is released under the Apache 2.0 license, making it suitable for both research and commercial projects.
“SANA-WM is an efficient open-source world model trained natively for one-minute generation.” — Source: Hugging Face