Multimodal

About multimodal releases

Discover new open‑source multimodal models. This archive covers models that can handle multiple functions such ah text, images, audio, and more.
Huge matte surface of the moon texture geometric prism floating centrally with complex network of glowing silver lines.
📌 Featured

Kimi K2.6 Launches To Automate Extended Programming Tasks

Kimi K2.6 launches as an open-source multimodal model built for extended autonomous tasks and complex programming workflows. It processes lengthy instructions and coordinates multiple sub-tasks to deliver complete outputs from…

Read more →

Latest multimodal models

May 31, 2026
StepFun Delivers Step-3.7-Flash MoE Vision Model for Local AI Agents

Step-3.7-Flash is a 198-billion-parameter vision‑language model that uses a sparse mixture‑of‑experts design to activate only about 11 billion parameters per token. It handles images and text natively through a 1.8‑billion‑parameter […]

Read More
May 31, 2026
NVIDIA's LocateAnything-3B Delivers One-Step Visual Grounding

LocateAnything-3B is a new vision‑language model from NVIDIA that finds and marks objects, text, or interface elements in images based on simple text prompts. Instead of predicting coordinates word‑by‑word like […]

Read More
May 31, 2026
Qwen3.6-35B-A3B-Uncensored-Genesis-V2-APEX-MTP-GGUF Removes All Refusals

Qwen3.6-35B-A3B-Uncensored-Genesis-V2-APEX-MTP-GGUF is a fully quantized, refusal-free language model that packages the original Qwen3.6‑35B‑A3B MoE architecture into ready‑to‑run GGUF files. This release combines APEX and MTP‑APEX quantization formats with a numerical […]

Read More
May 31, 2026
Qwen3.5-27B-uncensored-heretic-v2-Native-MTP-Preserved Removes 89% Of AI Refusals

The newly released Qwen3.5-27B-uncensored-heretic-v2-Native-MTP-Preserved is a modified version of Alibaba’s Qwen3.5-27B model that removes most content restrictions while keeping its performance nearly identical. This release preserves all 15 Multi-Token Prediction […]

Read More
May 31, 2026
Keye-VL-2.0-30B-A3B Brings Native Agent Tools To Long Video Ai

Keye-VL-2.0-30B-A3B is a new open-source multimodal model designed to understand long videos and perform agent tasks like code execution and web search. It uses a sparse attention mechanism called DSA […]

Read More
May 29, 2026
Qwopus3.6-27B-v2-MTP-GGUF Puts Faster Stepwise AI on Your GPU

Jackrong has released Qwopus3.6-27B-v2-MTP-GGUF, a quantized version of the new Qwopus reasoning model that uses multi-token prediction to speed up text generation. The original Qwopus3.6-27B-v2-MTP model was fine-tuned from Qwen3.6-27B […]

Read More
May 29, 2026
NuExtract3 Turns Sensitive Docs Into Markdown Without The Cloud

NuExtract3 is a new 4-billion-parameter vision-language model that extracts structured data from documents and converts them into Markdown. It handles text, images, or both at once, making it suitable for […]

Read More
May 28, 2026
New Gemma-4-Ortenzya-The-Creative-Wordsmith-31B-it-uncensored-heretic

The Gemma-4-Ortenzya-The-Creative-Wordsmith-31B-it-uncensored-heretic is a fine-tuned version of Google’s Gemma 4 31B instruct model that cuts content refusals dramatically while sharpening its writing style. It starts from an already decensored base, […]

Read More
May 27, 2026
Zero Refusals Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced Drops

Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced is a version of Google’s Gemma 4-26B model with all refusal mechanisms removed while keeping the original capabilities fully intact. This release candidate scored zero refusals across 465 standard […]

Read More
1 2 3 6