Audio

About audio model releases

Explore the latest open‑source audio and speech AI releases for local use. This archive covers new models and tools for voice cloning, text‑to‑speech, transcription, and music generation.
A tall translucent acoustic pillar resting on a smooth matte ivory stone surface the structure consists of eight precisely stacked glass rings.
📌 Featured

ACE-Step 1.5 XL turns plain text into full songs in eight quick steps

ACE-Step recently published ACE-Step 1.5 XL, an open audio generation model that produces complete music tracks in just eight steps. This streamlined process significantly reduces rendering wait times while preserving…

Read more →

Latest audio models

May 31, 2026
MOSS-TTS-v1.5 Lands With Precise Pause Controls And 31-Language Synthesis

MOSS-TTS-v1.5 is an upgraded open-source text-to-speech model from the OpenMOSS team, building on their earlier 1.0 release. It keeps zero-shot voice cloning, long-form generation, and multilingual capabilities while delivering more […]

Read More
May 25, 2026
DramaBox Interprets Stage Directions for Expressive AI Voiceovers

DramaBox is a text-to-speech system that turns scene descriptions and dialogue into expressive speech, complete with laughs, sighs, and pauses. It can clone a speaker’s timbre from just a 10-second […]

Read More
May 15, 2026
Scenema-Audio Lets You Direct Voices With Emotion And Scene Sounds

Scenema-Audio is a new open-source model that clones voices and generates speech with emotional acting, scene sounds, and zero-shot identity transfer. It doesn’t just read text aloud—it interprets stage directions […]

Read More
May 15, 2026
Supertonic-3 Whispers 31 Languages Directly From Your Device

Supertonic-3 is a lightweight text-to-speech system that runs entirely on your device using ONNX Runtime, with no cloud calls needed for synthesis. This open-weight release expands language support from 5 […]

Read More
April 30, 2026
Xiaomi Research Orchestrates ControlFoley For Video Soundtracks

ControlFoley transforms video clips into synchronized soundtracks by combining visual scenes, written descriptions, and existing audio samples into a single generation system. This new framework produces matching sound effects and […]

Read More
April 28, 2026
Trelis Debuts Chorus-v1-GGML For Local Voice Separation

Trelis recently released a specialized speech transcription model that handles overlapping conversations between two participants. The system processes audio clips locally without relying on external cloud servers. Built as an […]

Read More
April 20, 2026
OpenMOSS-Team Debut MOSS-TTS-Nano-100M Offline Audio Engine

MOSS-TTS-Nano-100M is a lightweight, open-source text-to-speech engine that generates natural audio directly on standard computers. The system converts typed prompts into clear speech while maintaining strict efficiency for daily use. […]

Read More
April 20, 2026
k2-fsa OmniVoice Turns Text To Speech In 600 Languages Offline

OmniVoice is an open source text-to-speech system that converts written words into spoken audio across more than six hundred languages. The software enables instant voice matching and allows users to […]

Read More
April 20, 2026
VoxCPM2 Brings Studio Sound To Local Devices

VoxCPM2 is an open text-to-speech system that generates studio-quality audio from written text. The tool reads words at standard clarity and outputs polished speech at a higher frequency without needing […]

Read More