Zai-org's SCAIL-2 Breathes Motion Into Still Characters Sans Skeleton

    
        By vramkickedin    
     | 
    
            June 24, 2026 at 5:51 pm        
    
     | 
    
        2 min read

SCAIL-2 is a new open-source model that animates still character images directly from a driving video without relying on skeleton maps or inpainting masks. This end-to-end approach removes information loss that occurs when converting motion into intermediate pose representations. The model also handles character replacement tasks and supports multi-character scenarios from a single interface.

Zai-org, the same team behind GLM, developed SCAIL-2 by building a synthetic training pipeline using several off-the-shelf models to generate 60,000 motion pairs. The team designed a Unified Motion Transfer Interface with specialized masking channels and a dedicated RoPE design to unify different animation tasks under one training process. By training the model to reverse the driving process, it learned capabilities beyond its teacher models.

End-to-end animation without intermediates

Key capabilities

End-to-end driving at 512p and 704p resolutions.
Cross-identity character replacement with detailed prompts.
Animal-to-character motion transfer without human skeletons.
Zero-shot support for SAM3D body mesh inputs.
Multi-reference generation using optional extra images.
Bias-Aware DPO LoRA for hand and face detail improvement.
Built-in Wan VAE and T5 in checkpoint.
ComfyUI integration with community workflows available.

Video creators and animators can use SCAIL-2 to transfer complex movements from any video source onto a reference character image. The removal of skeleton-based restrictions means driving sources can include animals or non-human motion that previous tools could not process. Users benefit from a single pipeline that handles both animation and character replacement without switching between different specialized tools.

Training data and model limitations

The project addresses a core weakness found in SCAIL-1, which identified pose representation and injection as key bottlenecks but still depended on intermediate representations. MotionPair-60K, the synthetic dataset created for training, combines data from multiple off-the-shelf models including MoCha and Wan-Animate alongside the team’s own SCAIL-Preview tool.

While multi-reference inference works in zero-shot mode, the model was not explicitly optimized for it and video quality may degrade when additional reference images are provided.

"SCAIL-2 is an open-source model for end-to-end controlled character animation. It animates a reference character with a driving video, and also supports character replacement and multi-character scenarios without relying on intermediate pose representations." — Source: Hugging Face

Project Links

Zai-org's SCAIL-2 Breathes Motion Into Still Characters Sans Skeleton

End-to-end animation without intermediates

Training data and model limitations

More Video Related News

Neodragon Conjures Private Video Creation Directly On Mobile Phones

SwiftVR Breathes New Life Into Old Video With Stunning Real Time 4K Upscaling

JoyAI-Echo Spins Multi-Shot AI Video Stories With Synced Audio

Cosmos3-Super-Image2Video Animates Stills with a Single Prompt