Fudan-FUXI Unveils Omni-Video 2 AI Tool

    
        By vramkickedin    
     | 
    
            March 20, 2026 at 5:37 pm        
    
     | 
    
        2 min read

Omni-Video 2 is a unified video editing and generation framework that combines a text-to-video diffusion model with vision-language understanding. The system can generate videos from text descriptions and edit existing footage with precise control over changes. It supports text-to-video creation, video-to-video editing, and mixed-condition generation all within a single pipeline.

Fudan-FUXI developed this tool to solve a common problem in video AI: turning short, simple prompts into detailed, accurate edits. The model uses a vision-language component that reads source videos and editing instructions, then predicts exactly what the final result should look like. This approach converts vague requests into specific instructions about content, attributes, and motion changes.

Model Size: 69.2GB & VRAM GPU: requirements vary

What Omni-Video 2 can do

Generate videos from text descriptions with high quality output
Edit existing videos with precise control over specific elements
Remove or add objects while preserving background details
Change backgrounds and environments smoothly
Handle complex motion editing across multiple frames
Process multi-element transformations including lighting and appearance changes

Video editors and content creators working on complex projects may find this tool useful for making detailed changes without manually editing each frame. The ability to understand and execute compositional instructions means users can request multiple changes at once, such as adjusting lighting while also modifying specific objects, rather than running separate edits sequentially.

Technical design choices

The team built Omni-Video 2 to scale efficiently by connecting pretrained multimodal language models directly to video diffusion models. A lightweight adapter injects conditional tokens into the system, allowing it to reuse existing generative capabilities without requiring a complete rebuild. According to the researchers:

'we scale up Omni-Video 2 to a 14B video diffusion model on meticulously curated training data with quality.'

The model was tested on the FiVE benchmark for fine-grained editing and VBench for generation tasks, showing strong performance in following complex instructions while maintaining competitive quality for video generation.

The 69.2GB model size means users will need substantial storage and likely a high-end GPU for local inference. Access OmniVideo2-A14B on Hugging Face. Read the full details on the project page or review the paper on arXiv.

Fudan-FUXI Unveils Omni-Video 2 AI Tool

What Omni-Video 2 can do

Technical design choices

More Video Related News

Neodragon Conjures Private Video Creation Directly On Mobile Phones

SwiftVR Breathes New Life Into Old Video With Stunning Real Time 4K Upscaling

Zai-org's SCAIL-2 Breathes Motion Into Still Characters Sans Skeleton

JoyAI-Echo Spins Multi-Shot AI Video Stories With Synced Audio