Skywork Unlocks Real-Time Worlds with Matrix-Game-3.0

Matrix-Game-3.0 is an open-source interactive world model that generates real-time video at 720p resolution and 40 frames per second. It uses a memory-augmented architecture to maintain consistency over long video sequences lasting up to a minute.
Developed by the Skywork AI Matrix-Game Team, this tool addresses the challenge of creating coherent, long-form interactive video content. The framework combines synthetic data from Unreal Engine, AAA game footage, and real-world videos to train its generation capabilities.
Model Size: from 12.9GB & VRAM GPU: requirements vary
Matrix-Game-3.0 key capabilities and features
- Generates 720p video at 40 FPS in real-time using a 5B parameter model.
- Maintains memory consistency over minute-long video sequences.
- Combines Unreal Engine synthetic data, AAA game data, and real-world video for training.
- Supports INT8 quantization for more efficient inference.
- Scales up to a 28B Mixture-of-Experts model for improved quality.
- Uses camera-aware memory for long-term spatiotemporal consistency.
Game developers and researchers working on interactive video generation may find this model useful for prototyping virtual environments or testing action-conditioned video systems. The ability to run real-time generation on a 5B model makes it practical for users without access to massive computing clusters, while the 28B option offers higher quality for those with more hardware resources.
Technical details and required setup
The framework unifies three stages into an end-to-end pipeline: a data engine for training data, model training using a Diffusion Transformer architecture, and optimized inference deployment. The team uses Distribution Matching Distillation combined with model quantization and VAE decoder distillation to achieve its real-time performance. Users will need FlashAttention installed and can run the model with multiple GPUs for faster inference.
The model is released under an MIT license, making it freely available for both research and commercial applications. Users can generate videos from an input image and text prompt, with options for random actions or custom interactive input.
Download Matrix-Game-3.0 models on Hugging Face.