Netflix Void-model Reconstructs Reality When Erasing Subjects

Netflix recently released Void-model, an open framework that removes video subjects while reconstructing the physical interactions they caused. Standard editors only fill background pixels, but this tool calculates how nearby items should move after deletion.
The system pairs visual recognition with generative AI to handle complex editing tasks. Users with powerful desktops process sensitive media locally instead of relying on remote servers.
Model Size: 22GB & VRAM GPU: 40GB required
Physical interaction repair
- Applies a four-channel mask to isolate targets and shifting background zones.
- Synthesizes realistic scenes by tracking lighting shifts and collision paths.
- Offers a secondary processing stage that stabilizes motion across longer clips.
- Includes manual adjustment tools for correcting mask boundaries before generation.
Production teams cutting out props avoid manual frame adjustments by letting the system handle gravity simulations. Studio artists editing dense shots save time with automated physics tracking.
Development goals and limitations
Engineers trained the architecture using simulated physics data to map item reactions under impact. The initial pass works for short footage, while the refinement stage stabilizes longer sequences. Local deployment requires substantial graphics memory.
"We hope this framework sheds light on how to make video editing models better simulators of the world through high-level causal reasoning,"
said the authors in the paper.
Download the complete source on GitHub, grab the weights from Hugging Face, and review the methodology in the arXiv publication.