Supertonic-3 Whispers 31 Languages Directly From Your Device

Supertonic-3 is a lightweight text-to-speech system that runs entirely on your device using ONNX Runtime, with no cloud calls needed for synthesis. This open-weight release expands language support from 5 to 31 languages and reduces the kind of repeat and skip failures that often break reading flow. The model is competitive with much larger TTS systems while remaining small enough for browser, edge, and local CPU-only use.
Supertone Inc. developed and open-sourced Supertonic-3, making the model weights available on Hugging Face under the OpenRAIL-M license and the sample code under MIT. The company focuses on a privacy-friendly design that processes everything offline, so no audio or text data ever leaves the machine. By delivering a complete ONNX package that downloads automatically on first run, the project removes the headache of manual dependency setup.
On-device TTS in 31 languages
- Supports 31 languages, up from 5 previously.
- Runs fast on CPU without a GPU.
- Uses only about 99 million parameters.
- Includes expression tags like <laugh> and <sigh>.
- Simple Python SDK with auto‑download of assets.
- Competitive accuracy with much larger TTS models.
This tool fits privacy-conscious professionals who need text-to-speech without streaming data to a cloud service. Hobbyists with consumer GPUs or even CPU‑only machines can run it locally for experiments and personal projects. Small agencies benefit from no recurring API costs and the ability to generate voiceovers or narration entirely on their own hardware.
Developer notes and performance
The entire model ships as ready‑to‑run ONNX assets, which means you never need to install PyTorch just for inference, keeping the environment light. Supertonic 3 draws around 99 million parameters, making startup time and memory use a fraction of what a 0.7B or 2B‑class open TTS system demands. While Supertone hasn’t announced a roadmap, the stability improvements in this release signal steady, practical refinement for on‑device work.
"Supertonic 3 is designed for practical on-device inference: compact enough to run locally, while staying competitive with much larger open TTS systems." — Source: Hugging Face