Havenoammo’s new Qwen3.6-27B-MTP-UD-GGUF package combines Unsloth Dynamic 2.0 XL quantization with grafted Multi-Token Prediction (MTP) layers for the Qwen3.6 27B model. This format enables speculative decoding, where the model predicts […]
Multimodal
About multimodal releases
Latest multimodal models
MiniCPM-V-4.6 is a new open-source multimodal model that brings image and video understanding directly to smartphones and small computers. It answers questions about photos and video clips without a cloud […]
The Qwen3.5-9B-DeepSeek-V4-Flash-GGUF is a compressed language model that packs DeepSeek-V4’s advanced reasoning into a 9-billion-parameter package for local use. It converts the full model into the GGUF format, so it […]
The Qwen3.6-27B-Heretic-Uncensored-FINETUNE-NEO-CODE-Di-IMatrix-MAX-GGUF package delivers an uncensored, performance-enhanced version of Qwen’s latest 27B model in highly accurate compressed formats. This release strips away the original model’s refusal behavior, cutting the refusal […]
Google just dropped a new tool that makes its open-source AI models run much faster. The Gemma-4-26B-A4B-It-Assistant is a lightweight draft model that predicts tokens ahead of the main AI, […]
The Gemma-4-31B-It-Assistant is a lightweight draft model built to speed up text generation when paired with Google’s full Gemma 4 31B instruction-tuned model. It uses a technique called speculative decoding […]
NVIDIA has released Nemotron-3-Nano-Omni-30B-A3B-Reasoning-NVFP4, an open multimodal AI model that simultaneously processes video, audio, images, and text. The 31-billion-parameter system uses a hybrid Mamba2-Transformer design that activates only about 3 […]
Mistral-Medium-3.5-128B is a dense flagship model designed to handle complex reasoning, coding, and instruction-following tasks. It serves as a unified replacement for several previous models released by the company. The […]
Nvidia recently released Nemotron-3-Nano-Omni-30B-A3B-Reasoning-BF16, an open multimodal AI system that processes video, audio, images, and text in a single workflow. Users can run it locally to summarize lengthy meetings, transcribe […]