Qwen3.6-35B-A3B-uncensored-heretic-Native-MTP-Preserved Fewer Refusals

Llmfan46 has released Qwen3.6-35B-A3B-uncensored-heretic-Native-MTP-Preserved, a modified version of Qwen3.6-35B-A3B that cuts unwanted refusals by 88% while keeping all 19 multi-token prediction (MTP) layers fully intact. The model uses an abliteration method to drastically reduce censorship, registering only 10 refusals out of 100 test prompts versus 83 from the original. Benchmark tests show the uncensored variant stays almost identical in capability, with MMLU accuracy dropping from 83.71% to 83.39%.
Independent creator Llmfan46 built this release using abliteration v1.3.0 and a specialized decensoring method that carefully targets key attention and MLP projection layers. They also provided a full set of quantized formats—GGUF, GPTQ-Int4, and NVFP4—so users on different hardware can run it. The goal is to give local AI users a censorship-resistant model that retains the original Qwen3.6’s strong agentic coding and reasoning performance.
Full MTP layers intact, 88% fewer refusals
- All 19 MTP layers preserved for fast decoding.
- Refusals drop from 83 to 10 per 100 prompts.
- KL divergence only 0.0015 from the original.
- MMLU score difference under 0.3 percentage points.
- GGUF, GPTQ-Int4, and NVFP4 quantized formats.
- Abliteration targets attn.o_proj, mlp.down_proj.
- Agentic coding benchmarks nearly unchanged.
This model is built for local AI enthusiasts, privacy-conscious developers, and small agencies that need a powerful open-weight model with far fewer refusals. The uncensored behavior lets you discuss sensitive topics, draft creative content, or test agent workflows without constant pushback. Because the full MTP layers are still there, it also keeps the speed boost from speculative decoding, making it practical for everyday coding and tool-use jobs.
Developer notes and known limits
Llmfan46 noted that they have hit Hugging Face’s free storage limit and can’t upload new models without community support, as they host over 70 free models without pay. The MTP tensor count appears as 19 entries in safetensors but 20 in GGUF because the gate‑up projection is stored as a fused or split tensor depending on the format. Every quantized release has been verified to retain the complete MTP capability, so speculative decoding performance is unchanged.
"This is the full model with the 19 MTPs all intacts. 88% fewer refusals (10/100 Uncensored vs 83/100 Original) while preserving model quality (0.0015 KL divergence)." — Source: Hugging Face