Phosphene Stitches Visuals and Sound Instantly on Macs

Phosphene is a free desktop panel that turns text or images into video clips with synchronized audio, running entirely on Apple Silicon Macs. It wraps the LTX 2.3 model through Apple’s MLX framework, generating video and sound in a single step instead of adding audio later. The version 2.0 release brings faster speeds, sharper upscaling, and a revamped interface built for one-click installation.
The tool was created by developer Mrbizarro as a polished, no-code front end for the open-source LTX-2-MLX project. It solves the hassle of building complex node graphs or paying for cloud video generators. Professionals and hobbyists who want private, local video creation with sound get a straightforward panel that handles heavy lifting behind the scenes.
Synced audio and video in one pass
- Four generation modes: text-to-video, image-to-video, first/last-frame interpolation, and clip extension.
- Quality tiers from a quick 2-minute draft to a high-quality 12-minute render with TeaCache acceleration.
- Optional Sharp upscale via PiperSR for a 2× detail boost on the Apple Neural Engine.
- Hardware-aware tier gating that detects your Mac’s RAM and suggests appropriate settings.
- Built-in prompt rewriting using a local Gemma model to match the video model’s trained format.
- Lossless H.264 output with fast-start, so gallery thumbnails appear instantly.
- One-click install through Pinokio with resumable downloads and no cloud dependency.
This tool suits Mac users who want to produce AI-generated video without uploading assets to the cloud. Small creative teams can iterate on concepts locally, keeping client work private. Hobbyists and developers with Apple Silicon machines get a fast, integrated alternative to complex pipeline setups.
Developer notes and known limits
The panel is exclusive to Apple Silicon because MLX runs only on Mac hardware; Intel and other platforms have no support. Memory pressure can crash the helper process on lower-RAM Macs, so closing other applications before rendering is recommended. Future updates aim to add an audio-to-video mode and better RAM advisories inside the panel.
LTX 2.3 generates video and audio in one forward pass — they share the diffusion process, so timing is tied at the frame level.