Audio

About audio model releases

Explore the latest open‑source audio and speech AI releases for local use. This archive covers new models and tools for voice cloning, text‑to‑speech, transcription, and music generation.

Digital stylized wireframe human mouth mid-vocalization emitting flowing digital sound waves.

📌 Featured June 2, 2026

VTS Turns Your Hummed Imitation Into a Real Sound Effect

VTS (Voice To Sound) is a newly released open-source model that turns a short vocal imitation and a text description into a realistic sound effect. Instead of fumbling to describe…

Latest audio models

July 7, 2026

NoizAI Orchestrates AudioX-Turbo For Rapid Sound Creation From Media

By vramkickedin

AudioX-Turbo is a new open source framework that generates audio and music from text, video, and existing audio signals. This release processes your inputs in just four steps to create […]

July 7, 2026

Dasheng-Audiogen By Xiaomi Research Crafts Mixed Audio From Text

By vramkickedin

The newly released Dasheng-Audiogen is an open source artificial intelligence model that creates full audio scenes from text descriptions. Instead of producing just one type of sound, it can blend […]

June 30, 2026

XiaomiMiMo Delivers MiMo-Audio-7B-Base For Realistic Voice Generation

By vramkickedin

The new release called MiMo-Audio-7B-Base is an open-source audio language model designed to learn new tasks from just a few examples. It processes over one hundred million hours of audio […]

June 30, 2026

Owensong Fashions Inflect-Nano-v1 To Turn Text Into Local Audio

By vramkickedin

Inflect-Nano-v1 is a tiny English text-to-speech model that turns written words into spoken audio. It includes its own audio generator and uses less than five million parameters to function. The […]

June 28, 2026

Zyphra Pioneers ZONOS2 For Natural Voice Cloning And Text To Speech

By vramkickedin

ZONOS2 is a new text-to-speech model designed to generate highly expressive and natural sounding audio. It predicts high quality audio tokens to create studio-grade sound at a 44.1 kHz sample […]

June 21, 2026

Sculpt Sound Instantly: Magenta-Realtime-2 Arrives for Local Devices

By vramkickedin

Google has released Magenta-Realtime-2, an open music generation model designed to create music on your own device with extremely low delay. This new system lets you steer musical output in […]

June 17, 2026

Rednote-Hilab Drops Dots.tts A 2B Param Speech Model That Clones Voices Natively

By vramkickedin

Dots.tts is a new 2-billion-parameter text-to-speech model that converts text directly into high-fidelity 48 kHz audio without relying on discrete audio codec tokens. The system operates fully end-to-end, using an […]

June 16, 2026

MisoLabs' MisoTTS Brings Conversational Speech Directly To Your Machine

By vramkickedin

MisoTTS is a new 8 billion parameter text-to-speech model now available on Hugging Face. It converts written text into natural, conversational speech while maintaining voice consistency from short audio samples. […]

June 16, 2026

Boson AI Drops Higgs-audio-v3-tts-4b For Expressive Multilingual Speech

By vramkickedin

Boson AI has released higgs-audio-v3-tts-4b, a 4-billion-parameter text-to-speech model designed specifically for conversational voice AI. Rather than simply reading text aloud, the model produces expressive speech with emotional tone, natural […]

1 2 3 4 Next »