Gemma-4-Harmonia-31B-uncensored-heretic Slashes Refusals by 91%

Gemma-4-Harmonia-31B-uncensored-heretic is a decensored version of a 31-billion-parameter language model that dramatically cuts response refusals by 91%. The release uses an ablation technique to strip away content restrictions while keeping the model’s core knowledge nearly identical to the original, with a tiny KL divergence score of just 0.0047. Benchmark results confirm only a minor accuracy shift, moving from 85.66% to 84.55% on the MMLU test, showing that most subject performance remains intact.
Independent developer Llmfan46 who also dropped Gemma-4-Ortenzya-The-Creative-Wordsmith-31B-it-uncensored-heretic created this model after hitting Hugging Face’s free storage limit for his collection of over 70 free models. He applied his v1.2.0 abliteration method across specific neural layers, targeting components from layer 14 through 55. The work modifies how the model handles requests without requiring cloud rental GPUs, though contributions help cover storage and compute costs for future projects.
Refusal reduction without quality loss
- Slashes refusals from 97% down to 9%.
- Preserves base intelligence with 0.0047 KL divergence.
- Available in Safetensors and GGUF quantized formats.
- Built via a multi-stage fusion of seven models.
- Targets attention output projection layers for decensoring.
- Supports independent developer through Patreon and Ko-fi.
- Designed for local inference on consumer-grade GPUs.
- Provides full MMLU benchmark subject breakdowns.
This model suits power users who want direct, unfiltered responses from a capable 31B model running on their own hardware. Developers and hobbyists building storytelling tools or creative applications will benefit from the near-stock performance without triggering preachy refusals. Privacy-focused professionals can run it entirely offline, keeping sensitive queries off cloud services.
Build process and developer notes
The model is the final product of a complex three-phase merge process called Harmonia, which combines seven foundation and specialized models using mathematical projection techniques rather than simple blending. The creator selected weight ranges conservatively, using 0.6 and 0.4 ratios during the CABS gating phase to avoid degrading logical reasoning. This build sits on Google DeepMind’s Gemma 4 foundation family and uses a tokenizer union from multiple source models.
"91% fewer refusals (9/100 Uncensored vs 97/100 Original) while preserving model quality (0.0047 KL divergence)." — Source: Hugging Face