LTX2.3-10Eros Brings Still Images To Life With Layered Precision

LTX2.3-10Eros is a new image-to-video model merge that turns a single still image into a short motion clip. Unlike standard weight blending, it combines layers from different training steps to improve control and prompt response. The result is a more predictable generation that sticks to the directions you give it.
TenStrip developed this experimental merge specifically for image-to-video tasks. They wanted a model that behaves more consistently than adding a LoRA adapter, while still respecting detailed text instructions. The release ships in multiple formats, including a full BF16 checkpoint and an FP8 version optimized by S1LV3RC01N.
Reinforced prompt control and file options
- Layer-scaled merge, not a weight average.
- Outperforms LoRA loading for prompt adherence.
- BF16 and FP8 mixed-precision checkpoints included.
- Separate Kijai split files for FP8 Transformer.
- Needs explicit, enhanced scene descriptions.
- Warning: large distilled LoRAs can damage results.
This release speaks to creators who need precise direction over AI-generated video. You benefit if you already write detailed scene scripts and want the model to follow specific motion, dialogue, or sound cues. Just be prepared to describe everything you want to see — the system gives nothing unless you clearly ask.
How the model thinks and what’s next
The underlying LTX architecture has very little self-reasoning when conditioned on a first frame, so strong prompting is mandatory. TenStrip recommends using an external large language model like Grok to craft a rich, evolving script before feeding it to the video model. They also caution that heavier distilled LoRAs can interfere with the fine-tuned merge, suggesting lighter, “cond_safe” alternatives instead.
It uses layer scaled merges of different steps, it's not a straight weight merge.