Gemma-4-Gembrain-31B-It-Uncensored-Heretic Slashes AI Refusals By 87%

    
        By vramkickedin    
     | 
    
            May 28, 2026 at 3:02 pm        
    
     | 
    
        2 min read

Gemma-4-Gembrain-31B-it-uncensored-heretic is a stripped-down version of Google’s Gemma 4 31B instruct model that says “no” far less often. It was built with the Abliteration technique to remove most safety refusals while keeping the model’s knowledge and reasoning nearly identical. The result is a 31-billion-parameter local model that answers most questions directly, without lecturing or deflection.

The creator, llmfan46, runs an unpaid Hugging Face repository with over 70 free models and used Abliteration v1.2.0 to target specific attention layers in the original Gemma model. By carefully tuning several weights, they reduced refusals by 87% — from 99 refusals out of 100 in the original to just 13 — all while preserving normal helpful behavior. The goal is to give users a tool that doesn’t censor its own output, whether for creative writing, roleplay, or private analysis.

87% fewer refusals with barely any quality loss

Key performance highlights

87% fewer refusals (13 vs 99 per 100).
KL divergence of only 0.0186 from original.
MMLU accuracy of 85.9% (original 86.65%).
GGUF quants from 17.8 GB to 32.6 GB.
Supports reasoning with <|think|> prompt tag.
Merged from multiple creative fine‑tunes.
Fits on 24 GB consumer GPUs (Q4_K_M).

This model is for anyone who wants an LLM that doesn’t refuse topics outright. Creatives can use it for unfiltered storytelling, while power users can run it locally on a single consumer GPU for completely private, uncensored work. Small agencies that need on‑premise AI without corporate safety filters will also find it immediately useful.

Developer notes and what’s next

The creator warns they have hit Hugging Face’s free storage limit and cannot upload new models without community support. The model was built through a complex five‑phase merge process that combined several creative fine‑tunes before applying Abliteration, and all contributions go directly toward hosting costs. While the uncensoring is aggressive, the developer kept the impact on factual knowledge minimal, and future releases depend on users chipping in for storage and cloud compute.