Hy-MT1.5-1.8B-1.25bit Puts 33-Language Translation In Your Pocket

    
        By vramkickedin    
     | 
    
            May 10, 2026 at 6:15 pm        
    
     | 
    
        2 min read

The new release Hy-MT1.5-1.8B-1.25bit is a heavily compressed language translation model designed to run entirely on your phone, no internet needed. It shrinks a powerful 1.8 billion parameter system down to a 440 megabyte file, putting high-quality offline translation into a pocket-sized package. The model covers 33 languages and over a thousand translation directions, delivering results that compete with much larger cloud-based services.

AngelSlim, the compression toolkit team from Hunyuan AI Infra, applied their Sherry quantization technique to shrink the Tencent Hunyuan Team’s Hy-MT1.5-1.8B translation model. They reduced the original 3.3GB 16-bit version to a tiny 1.25-bit format that runs efficiently on everyday phone processors thanks to a custom STQ kernel. The result is a private, always-available translator that keeps every word of your text on your own device, addressing both privacy worries and dead zones.

Extreme compression for on-device translation

Key Features

33 languages and 1,056 translation directions.
Squeezes 3.3GB model down to 440MB.
Custom STQ kernel optimizes phone CPU use.
Works fully offline, data never leaves device.
Android demo with cross-app background translation.
Outperforms models like Tower-Plus-72B on benchmarks.
Built on 1.25-bit ternary quantization research.

Small agencies and privacy-conscious professionals can install this model on ordinary phones and get instant translations inside any app without sending conversations to a cloud service. Frequent travelers and field workers gain a reliable language tool that works in airplane mode, with no subscriptions or data fees. Tinkerers and local AI fans get a rare chance to run a production-grade compressed model on modest hardware and see how extreme quantization performs in real life.

How they compressed it

Sherry uses a fine-grained sparsity trick: for every four model weights, it stores the three most important as simple positive or negative ones and throws out the least important. Developers need to grab a special llama.cpp branch with the STQ1_0 kernel while the pull request awaits merger into the main project. The AngelSlim team is actively improving the toolkit and invites feedback and suggestions through GitHub issues.