Audio

About audio model releases

Explore the latest open‑source audio and speech AI releases for local use. This archive covers new models and tools for voice cloning, text‑to‑speech, transcription, and music generation.

Latest audio models

April 15, 2026
ACE-Step 1.5 XL turns plain text into full songs in eight quick steps

ACE-Step recently published ACE-Step 1.5 XL, an open audio generation model that produces complete music tracks in just eight steps. This streamlined process significantly reduces rendering wait times while preserving […]

Read More
April 7, 2026
Foundation-1 Crafts Structured Loops for Producers

Foundation-1 is a text-to-sample model built for structured music production. It generates tempo-synced, key-aware loops that slot directly into production workflows instead of producing generic audio textures. RoyalCities developed this […]

Read More
April 7, 2026
LongCat-AudioDiT Masters Zero-Shot Voice Cloning with Ease

LongCat-AudioDiT is a new text-to-speech model that generates high-fidelity audio directly from text inputs. It operates directly on the waveform latent space rather than relying on intermediate acoustic representations like […]

Read More
March 30, 2026
PrismAudio Transforms Video into Realistic Soundtracks

PrismAudio is a new framework that generates audio from video using reinforcement learning with Chain-of-Thought (CoT) planning. Developed by the FunAudioLLM team, it breaks down the complex task of video-to-audio […]

Read More
March 26, 2026
Yuriyvnv Refines Dutch Speech Data With WAVe Update

WAVe-1B-Multimodal-NL is a 1 billion parameter model that checks the quality of synthetic speech at the word level. It examines how well spoken audio matches its written transcript, catching errors […]

Read More
March 22, 2026
OpenMOSS MOSS-TTS Speech Studio for home GPUs

MOSS-TTS Family is an open-source speech and sound generation model family from MOSI.AI and the OpenMOSS team. It is designed for high-fidelity audio generation across complex real-world scenarios, including long-form […]

Read More
March 15, 2026
ACE-Step Pumps It Up With Ace-Step 1.5

ACE-Step 1.5 is a new open-source music generation model that brings commercial-grade audio creation to consumer hardware. It generates full songs in under 10 seconds on an RTX 3090 while […]

Read More
February 21, 2026
Qwen Launches Qwen3 ASR 1.7B with Top Accuracy

Qwen has revealed the Qwen3-ASR family, a new suite of two automatic speech recognition models that includes the Qwen3-ASR-1.7B and Qwen3-ASR-0.6B alongside the Qwen3-ForcedAligner-0.6B. These models support language identification and […]

Read More
January 30, 2026
Qwen Launches Qwen3 TTS Multilingual Text-to-Speech AI

Qwen has introduced Qwen3 TTS, a versatile text-to-speech series trained on over 5 million hours of speech data across 10 different languages. The new AI technology delivers exceptional capabilities in […]

Read More