ACE-Step recently published ACE-Step 1.5 XL, an open audio generation model that produces complete music tracks in just eight steps. This streamlined process significantly reduces rendering wait times while preserving […]
Audio
About audio model releases
Latest audio models
Foundation-1 is a text-to-sample model built for structured music production. It generates tempo-synced, key-aware loops that slot directly into production workflows instead of producing generic audio textures. RoyalCities developed this […]
LongCat-AudioDiT is a new text-to-speech model that generates high-fidelity audio directly from text inputs. It operates directly on the waveform latent space rather than relying on intermediate acoustic representations like […]
PrismAudio is a new framework that generates audio from video using reinforcement learning with Chain-of-Thought (CoT) planning. Developed by the FunAudioLLM team, it breaks down the complex task of video-to-audio […]
WAVe-1B-Multimodal-NL is a 1 billion parameter model that checks the quality of synthetic speech at the word level. It examines how well spoken audio matches its written transcript, catching errors […]
MOSS-TTS Family is an open-source speech and sound generation model family from MOSI.AI and the OpenMOSS team. It is designed for high-fidelity audio generation across complex real-world scenarios, including long-form […]
ACE-Step 1.5 is a new open-source music generation model that brings commercial-grade audio creation to consumer hardware. It generates full songs in under 10 seconds on an RTX 3090 while […]
Qwen has revealed the Qwen3-ASR family, a new suite of two automatic speech recognition models that includes the Qwen3-ASR-1.7B and Qwen3-ASR-0.6B alongside the Qwen3-ForcedAligner-0.6B. These models support language identification and […]
Qwen has introduced Qwen3 TTS, a versatile text-to-speech series trained on over 5 million hours of speech data across 10 different languages. The new AI technology delivers exceptional capabilities in […]