Multimodal

About multimodal releases

Discover new open‑source multimodal models. This archive covers models that can handle multiple functions such ah text, images, audio, and more.

Digital graphic of a miniature swirling galaxy.

📌 Featured June 15, 2026

NVIDIA Drops Cosmos3-Super to Seed Entire Worlds from a Single Prompt

Cosmos3-Super is a new release from NVIDIA that generates video, images, audio, and even robot action plans from mixed inputs like text, photos, and video clips. It's an omnimodal world…

Latest multimodal models

June 30, 2026

Baidu Introduces Unlimited-OCR To Read Long Documents At Constant Speed

By vramkickedin

Baidu has introduced Unlimited-OCR, a new tool designed to read and transcribe long documents without losing speed. This model processes dozens of pages in a single pass by maintaining a […]

June 30, 2026

SupraLabs Debuts Supra-A2A-Nano-Exp For Unified Media Handling

By vramkickedin

Supra-A2A-Nano-Exp is an experimental proof-of-concept any-to-any model that processes text, images, and video using a single system. It translates visual inputs into a small set of learned codes and treats […]

June 30, 2026

XiaomiMiMo Debuts MiMo-Audio-7B-Instruct For Smart Sound Generation

By vramkickedin

MiMo-Audio-7B-Instruct is a new audio language model designed to understand and generate sound based on simple instructions. It learns from a massive amount of audio data to perform tasks like […]

June 30, 2026

Datalab to Introduces Lift To Pull Neat Data From Messy Documents

By vramkickedin

The new release of Lift provides a way to pull organized data out of PDFs and images. Users can provide a standard JSON format, and the model will generate matching […]

June 29, 2026

Llmfan46 Frees Gemma-4-31B-it-qat-q4_0-unquantized-uncensored-heretic

By vramkickedin

The new release called Gemma-4-31B-it-qat-q4_0-unquantized-uncensored-heretic is a modified version of the Gemma 4 language model designed to bypass safety filters. It uses quantization-aware training to maintain performance while reducing memory […]

June 28, 2026

InclusionAI Deploys VISTA-9B To Map Text Commands To Screen Clicks

By vramkickedin

VISTA-9B is a visual model that understands screen layouts and translates natural language instructions into precise click coordinates. It looks at a screenshot and figures out exactly where to click […]

June 28, 2026

Unsloth Brings Kimi-K2.7-Code-GGUF Coding Brain To Home Computers

By vramkickedin

Kimi-K2.7-Code-GGUF is a coding-focused AI model designed to handle complex software engineering tasks from start to finish. It is built upon a previous version called Kimi K2.6 and improves token […]

June 24, 2026

Unsloth Polishes Gemma-4-26B-A4B-It-Qat-GGUF With Speed Boosts

By vramkickedin

Unsloth has released gemma-4-26B-A4B-it-qat-GGUF, a new quantized version of Google DeepMind’s Gemma 4 26B Mixture-of-Experts model. It uses Quantization-Aware Training to preserve near-original quality while shrinking the model’s memory footprint. […]

June 24, 2026

Wildly Uncensored Qwen3.6-40B-Claude-4.6-Opus-Deckard-Heretic-Uncensored-Thinking-NEO-CODE-Di-IMatrix-MAX-GGUF

By vramkickedin

A new uncensored model is now available on Hugging Face and it carries the long name Qwen3.6-40B-Claude-4.6-Opus-Deckard-Heretic-Uncensored-Thinking-NEO-CODE-Di-IMatrix-MAX-GGUF. This release is a 40-billion-parameter dense language model that was expanded from Qwen's […]

1 2 3 … 9 Next »