Trending Model:#1Unlimited-OCRbaidu⬇758kTrending Model:#2Qwythos-9B-Claude-Mythos-5-1M-GGUFempero-ai⬇1251kTrending Model:#3GLM-5.2zai-org⬇176kTrending Model:#4Ornith-1.0-35B-GGUFdeepreinforce-ai⬇285kTrending Model:#5Ornith-1.0-9B-GGUFdeepreinforce-ai⬇255kTrending Model:#6gemma-4-12B-agentic-fable5-composer2.5-v2-3.5x-tau2-GGUFyuxinlu1⬇314kTrending Model:#7Ornith-1.0-9Bdeepreinforce-ai⬇58kTrending Model:#8Ornith-1.0-35Bdeepreinforce-ai⬇186kTrending Model:#9Qwen-AgentWorld-35B-A3BQwen⬇39kTrending Model:#10DeepSeek-V4-Pro-DSparkdeepseek-ai⬇8kTrending Model:#1Unlimited-OCRbaidu⬇758kTrending Model:#2Qwythos-9B-Claude-Mythos-5-1M-GGUFempero-ai⬇1251kTrending Model:#3GLM-5.2zai-org⬇176kTrending Model:#4Ornith-1.0-35B-GGUFdeepreinforce-ai⬇285kTrending Model:#5Ornith-1.0-9B-GGUFdeepreinforce-ai⬇255kTrending Model:#6gemma-4-12B-agentic-fable5-composer2.5-v2-3.5x-tau2-GGUFyuxinlu1⬇314kTrending Model:#7Ornith-1.0-9Bdeepreinforce-ai⬇58kTrending Model:#8Ornith-1.0-35Bdeepreinforce-ai⬇186kTrending Model:#9Qwen-AgentWorld-35B-A3BQwen⬇39kTrending Model:#10DeepSeek-V4-Pro-DSparkdeepseek-ai⬇8k

Gemma-4-Harmonia-31B-uncensored-heretic Slashes Refusals by 91%

Fractured crystal prism with shattered chain links with sharp faceted edges.

Gemma-4-Harmonia-31B-uncensored-heretic is a decensored version of a 31-billion-parameter language model that dramatically cuts response refusals by 91%. The release uses an ablation technique to strip away content restrictions while keeping the model’s core knowledge nearly identical to the original, with a tiny KL divergence score of just 0.0047. Benchmark results confirm only a minor accuracy shift, moving from 85.66% to 84.55% on the MMLU test, showing that most subject performance remains intact.

Independent developer Llmfan46 who also dropped Gemma-4-Ortenzya-The-Creative-Wordsmith-31B-it-uncensored-heretic created this model after hitting Hugging Face’s free storage limit for his collection of over 70 free models. He applied his v1.2.0 abliteration method across specific neural layers, targeting components from layer 14 through 55. The work modifies how the model handles requests without requiring cloud rental GPUs, though contributions help cover storage and compute costs for future projects.

Refusal reduction without quality loss

Key Features
  • Slashes refusals from 97% down to 9%.
  • Preserves base intelligence with 0.0047 KL divergence.
  • Available in Safetensors and GGUF quantized formats.
  • Built via a multi-stage fusion of seven models.
  • Targets attention output projection layers for decensoring.
  • Supports independent developer through Patreon and Ko-fi.
  • Designed for local inference on consumer-grade GPUs.
  • Provides full MMLU benchmark subject breakdowns.

This model suits power users who want direct, unfiltered responses from a capable 31B model running on their own hardware. Developers and hobbyists building storytelling tools or creative applications will benefit from the near-stock performance without triggering preachy refusals. Privacy-focused professionals can run it entirely offline, keeping sensitive queries off cloud services.

Build process and developer notes

The model is the final product of a complex three-phase merge process called Harmonia, which combines seven foundation and specialized models using mathematical projection techniques rather than simple blending. The creator selected weight ranges conservatively, using 0.6 and 0.4 ratios during the CABS gating phase to avoid degrading logical reasoning. This build sits on Google DeepMind’s Gemma 4 foundation family and uses a tokenizer union from multiple source models.

"91% fewer refusals (9/100 Uncensored vs 97/100 Original) while preserving model quality (0.0047 KL divergence)." — Source: Hugging Face