OmniNFT LoRA Adapters Fix Lip-Sync and Audio-Video Alignment for LTX

A softly glowing rectangular video frame with a delicate waveform trace in pale blue colors.

OmniNFT is a set of LoRA adapters that fine-tune the open-source LTX Video model to produce better-aligned audio and video. Using reinforcement learning, it guides the generation process so that sounds match on-screen actions more accurately. The result is fewer floating-head lip-sync errors and cleaner overall video quality.

Researcher Zghhui and collaborators published the OmniNFT method alongside ready-to-merge weights for LTX-2 and LTX-2.3. They combined several reward models to judge visual quality, audio-text alignment, and cross-modal sync separately, then trained the adapters to optimize all three at once. The code and weights are open but labeled for research use only.

Smarter reward routing improves audio-video sync

Key features
  • Independent rewards for video, audio, and sync.
  • Prevents gradient bleed from video into audio layers.
  • Focuses optimization on sound-emitting image regions.
  • Plug-and-play LoRA weights for LTX-2 and LTX-2.3.
  • One command merges adapter into base checkpoint.
  • Inference runs on a single consumer GPU.
  • Produces both .mp4 with audio and .wav separately.

This project suits hobbyists and small studios who generate short video clips locally and need better lip-sync without cloud APIs. Privacy-minded professionals can keep their work offline while still getting the quality uplift. Anyone already running LTX Video on a prosumer GPU can merge the LoRA and immediately test improved output.

Planned ComfyUI support and research-only license

The team says a ComfyUI-compatible format is on the way, though details are incomplete at launch. Because the LoRA relies on multiple reward models, the full training pipeline requires several separate server processes, but inference is streamlined. The license limits the code and weights to research use, with commercial applications requiring one to check submodule terms.

"Modality-wise Advantage Routing — Instead of collapsing all rewards into a single global advantage, OmniNFT computes independent per-reward advantages for video, audio, and cross-modal synchronization, then routes each to its responsible generation branch." — Source: GitHub