Alibaba DAMO Academy Debuts LumosX for Consistent Multi-Subject Videos

Macro shot of a camera surrounded by floating colorful shapes

LumosX is a new framework for generating personalized videos with multiple subjects. It creates videos where specific people and objects stay consistent throughout, keeping the right attributes matched to the right identities throughout each clip.

Developed by researchers at Alibaba DAMO Academy, this tool addresses a common problem in AI video generation. Current methods often struggle to keep faces and their associated items properly aligned when multiple characters appear in the same scene.

Model Size: requirements vary & VRAM GPU: requirements vary

Core model capabilities

  • Keeps multiple subjects consistent throughout a video.
  • Matches faces to their correct attributes like clothing or accessories.
  • Uses special attention mechanisms to track subject relationships.
  • Works with both foreground characters and background elements.
  • Processes text prompts to generate identity-consistent video content.

Video creators and content producers working on personalized media projects may find this useful for generating clips with specific characters. The framework handles the complex task of keeping track of who is wearing what, which helps when creating narrative content or marketing materials with consistent branding across scenes.

How the system works

The system uses what the researchers call Relational Self-Attention and Relational Cross-Attention. These components work together to encode the relationships between subjects and their attributes explicitly. A data pipeline built with multimodal large language models extracts and assigns subject-specific dependencies from input content. The paper notes that LumosX achieves

'state-of-the-art performance in fine-grained, identity-consistent, and semantically aligned personalized multi-subject video generation.'

The project has been accepted to ICLR 2026, and code is organized into self-contained subprojects for easier setup. Users can clone the repository and follow the documentation for installation, checkpoints, and inference instructions.

Get LumosX on GitHub. You can also access the models on Hugging Face or read the full paper on arXiv.