Studiomi300 Spins One Prompt Into a 30s Cinematic Reel

The studiomi300 pipeline turns a single text prompt into a complete 30-second cinematic reel, complete with consistent characters, music, and voice-over. It strings together multiple large AI models—a director, image generator, animator, music generator, and text-to-speech—all running on one GPU. The end result is a ready-to-share video file with no manual editing required.
Developer Bladedevoff built Studiomi300 solo for the AMD Developer Hackathon in May 2026. The project shows how the massive 192 GB of memory on an AMD Instinct MI300X accelerator can run four very different AI architectures one after another without separate machines. For anyone without this level of hardware, the same workflow would normally need several consumer GPUs networked together.
How the reel comes together
- One prompt, one finished 30-second video.
- Director agent plans six shots automatically.
- Character portraits stay consistent across shots.
- Animation runs at native Wan2.2 resolution.
- Vision critic retries poor clips up to three times.
- Music generated from a brief, no samples.
- Voice-over supports nine spoken languages.
- All model weights are Apache 2.0 or MIT.
This tool is meant for video creators, indie filmmakers, and privacy-focused professionals who want to prototype short narrative clips locally. Because everything stays on a single machine, there is no data leaving your control. Small studios can use it to quickly generate storyboard-like reels with consistent characters without needing cloud rendering farms.
What the developer wants you to know
The vision critic and director share one Qwen3.5-35B model checkpoint, reloaded between phases to save memory. FP8 acceleration for the video stage is not yet stable on this GPU and remains deactivated by default, so all video generation runs in BF16 precision. The pipeline’s “incidents.md” file openly documents failures like headless characters and kernel crashes, showing an honest development log.
"Same stack on a 24 GB consumer GPU would need 4-5 boxes wired together." — Source: Reddit