Multimodal

About multimodal releases

Discover new open‑source multimodal models. This archive covers models that can handle multiple functions such ah text, images, audio, and more.

Latest multimodal models

May 27, 2026
Intern-S2-Preview Packs Trillion-Scale Science Smarts Into A 35B Model

Intern-S2-Preview is a 35-billion-parameter scientific multimodal model that analyzes text, images, and time-series data while calling external tools. It continues pretraining from Qwen3.5 and undergoes a full training chain from […]

Read More
May 26, 2026
Fara-7B: A Tiny AI Agent That Runs Your Web Chores Privately

Fara-7B is a new open-weight computer use agent that understands screenshots and text to complete multi-step web tasks. It takes a high-level goal like “book a restaurant” and plans and […]

Read More
May 26, 2026
Marlin-2B Pins Down Every Second Of Your Video

Marlin-2B is a new open-source video language model that extracts structured descriptions and second‑precise timestamps from video footage. It answers the two questions developers most often ask about a video: […]

Read More
May 26, 2026
Qwopus3.5-9B-Coder-GGUF Puts A Private Coding Agent On Your Laptop

Qwopus3.5-9B-Coder-GGUF is a compressed, ready-to-run model file that brings an experimental 9‑billion‑parameter coding agent to local machines. It specializes in writing, debugging, and refactoring code, and can call tools like […]

Read More
May 26, 2026
Lance Unifies Image And Video Generation And Editing In One Lightweight Model

Lance is a new open-source AI model that handles image and video tasks like understanding, generation, and editing all in one place. It was trained entirely from scratch with only […]

Read More
May 26, 2026
SenseNova-U1-A3B-MoT A Unified Vision-Language Powerhouse That Runs Locally

SenseNova-U1-A3B-MoT is a new open-source vision-language model that handles image understanding, generation, and editing through a unified architecture without relying on separate visual encoders. This release belongs to the SenseNova […]

Read More
May 23, 2026
Qwen3.6-35B-A3B-uncensored-heretic-Native-MTP-Preserved Fewer Refusals

Llmfan46 has released Qwen3.6-35B-A3B-uncensored-heretic-Native-MTP-Preserved, a modified version of Qwen3.6-35B-A3B that cuts unwanted refusals by 88% while keeping all 19 multi-token prediction (MTP) layers fully intact. The model uses an abliteration […]

Read More
May 19, 2026
Unsloth Drops Qwen3.6-27B-GGUF-MTP For 2x Faster Local AI

Unsloth has released Qwen3.6-27B-GGUF-MTP, a quantized model file that preserves the multi-token prediction (MTP) layers from Qwen’s latest 27-billion-parameter language model. This GGUF format makes it possible to run the […]

Read More
May 19, 2026
Ovis2.6-80B-A3B Lands Private Visual AI on a Single GPU

Ovis2.6-80B-A3B is a new multimodal AI that pairs vision and language through a mixture-of-experts design, keeping it fast and efficient. It can examine high-resolution images, long documents, and even videos, […]

Read More