Jiunsong Unleashes Supergemma4-26b-uncensored-gguf-v2 For Open Chat

Supergemma4-26b-uncensored-gguf-v2 is a compressed language model that delivers open conversation without restrictive safety filters. The package wraps a 26-billion-parameter network into a GGUF container for straightforward local execution.
Jiunsong developed the file to fix routing errors that push casual prompts toward programming outputs. Private users and small teams benefit from the stable text generation and consistent offline operation.
Model Size: ~15.5GB & VRAM GPU: requirements vary
Core capabilities and deployment setup
- Embedded neutral chat template that prevents unwanted shifts into technical coding modes.
- Q4_K_M format that maintains response quality while keeping memory usage low.
- Higher scoring results in reasoning, web tasks, and multilingual processing compared to older versions.
- Full llama.cpp support with confirmed operation on recent consumer chips.
Professionals managing internal knowledge bases or drafting sensitive reports can rely on this configuration for predictable offline results. The corrected routing keeps answers aligned with the original prompt, eliminating the need for constant instruction adjustments.
Creator observations and speed metrics
The developer patched an export tool to properly handle mixture-of-experts weights, which resolved earlier stability issues. Internal testing measured prompt processing at 222 tokens per second and continuous generation at roughly 89 tokens per second.
"This release is for people who want three things together: a model that feels less censored than stock chat releases, a model that is more capable than the raw base on practical text workloads, and a compact local GGUF that still serves quickly on Apple Silicon,"
Jiunsong explained on their project page.
The complete Supergemma4-26b-uncensored-gguf-v2 model is available through the official Hugging Face page.