Qwen3.5-27B-uncensored-heretic-v2-Native-MTP-Preserved Removes 89% Of AI Refusals

The newly released Qwen3.5-27B-uncensored-heretic-v2-Native-MTP-Preserved is a modified version of Alibaba’s Qwen3.5-27B model that removes most content restrictions while keeping its performance nearly identical. This release preserves all 15 Multi-Token Prediction (MTP) layers intact, so users don’t lose the speed benefits of speculative decoding. It provides a way to run a powerful general-purpose AI locally without the usual refusal responses on sensitive topics.
Independent developer llmfan46 who also gave us Qwen3.6-35B-A3B-uncensored-heretic-Native-MTP-Preserved, created this model using a technique called abliteration, which surgically removes refusal behavior from specific parts of the neural network. The process targeted three component types—attention output projections and MLP down-projections—to cut refusals by 89% while maintaining a KL divergence score of just 0.0308 from the original. The developer also included detailed benchmark data showing the modified model scores 86.18% on the MMLU test, only a 0.35% drop from the original’s 86.53%.
Performance and usability highlights
- 15 native MTP layers fully preserved.
- 89% fewer refusals than original Qwen3.5.
- Minimal accuracy loss of 0.35% on benchmarks.
- 27B parameters fitting consumer GPU setups.
- Available in GGUF, NVFP4, and GPTQ formats.
- 262,144 token native context length.
- Vision-language capabilities remain operational.
- Works with vLLM, SGLang, and KTransformers.
This model suits home lab enthusiasts running multi-GPU rigs or Mac Studios who want a censorship-free assistant for research, creative writing, or exploring controversial topics. Privacy-focused professionals can benefit since everything runs locally with no data leaving their machine. The quantized GGUF versions make it practical for setups with limited VRAM who still want strong performance without content filters.
Developer notes and build details
The creator emphasizes that Qwen3.5 and Qwen3.6 behave very differently under abliteration despite sharing the same architecture. Qwen3.5 models tolerate much higher KL divergence values without meaningful accuracy loss, while Qwen3.6 models show catastrophic degradation even at extremely low divergence levels. llmfan46 also noted the project is at risk of pausing, having reached Hugging Face’s free storage limit with 70-plus models hosted as an unpaid independent contributor.
"Qwen3.6 models are mainly meant for agentic and coding AI assistance and Qwen3.5 models are mainly meant for general purpose AI assistance." — Source: Reddit