Private Coding Assistant: Gemma-4-12B-Coder-Fable5-Composer2.5-V1-GGUF

The Gemma-4-12B-coder-fable5-composer2.5-v1-GGUF is a quantized coding model that fits on low-resource hardware. It runs entirely offline without any cloud API, needing only around 4.5 GB of VRAM or unified memory. The model thinks through problems before writing Python solutions, thanks to distillation from two chain-of-thought datasets.
Developer Yuxinlu1 built this as a personal fine-tune of Google's Gemma 4 12B. It was trained on verifiable coding tasks where only solutions that passed tests were used. This ensures the model's reasoning is grounded in working code.
Private coding assistant for low-resource setups
- Runs with as little as 4.5 GB VRAM.
- Works offline, no internet required.
- Displays chain-of-thought reasoning before code.
- Trained only on passing Python solutions.
- Available in Q2_K through Q8_0 quants.
- Supports 256K token context window.
This tool suits developers who have limited GPU memory and want a private coding assistant. It is useful for generating Python algorithm solutions where the explanation of reasoning matters. Since the model is task-focused, users should verify outputs for general-knowledge queries.
What to know before you download
A metadata bug originally limited the context to 131K, but a community patch now restores the full 256K. A newer v2 version already adds agentic tool-use, jumping a telecom benchmark from about 15% to 55% pass rate. The model has reduced safety refusals, so add your own guardrails before production use.
"If you've got ~4.5 GB of VRAM or unified memory free, you can run your own private, offline coding assistant right now." — Source: Hugging Face