Multimodal

June 24, 2026

MiniMax-M3 Handles 1M Tokens Across Text Images And Video Natively

By vramkickedin

MiniMax-M3 is a new native multimodal AI model from MiniMaxAI that processes text, images, and video with a 1 million token context window. The model contains about 428 billion total […]

June 24, 2026

Moonshot AI Drops Kimi-K2.7-Code with 30% Thinking Token Cut

By vramkickedin

Moonshot AI has released Kimi-K2.7-Code, an open-source coding agentic model that significantly upgrades long-horizon software engineering performance. It builds directly on the Kimi K2.6 architecture while cutting thinking-token usage by […]

June 24, 2026

Google Forges Diffusiongemma-26B-A4B-It A Diffusion Model That Denoises Entire Text Blocks At Once

By vramkickedin

Google has released a new open-weights AI model called Diffusiongemma-26B-A4B-it, which uses a unique method to generate text significantly faster than traditional models. Instead of creating text one token at […]

June 17, 2026

Run A 31B AI Model On One GPU With Gemma-4-31B-it-qat-w4a16-ct

By vramkickedin

Google has released Gemma-4-31B-it-qat-w4a16-ct, a compressed version of the new Gemma 4 31B instruction-tuned model that uses Quantization-Aware Training (QAT) to dramatically cut memory use while preserving high performance. The […]

June 16, 2026

Hcompany Ships Holo-3.1-0.8B To Put Vision AI Agents Inside Your Pocket

By vramkickedin

Hcompany has released Holo-3.1-0.8B, the smallest model in a fresh family of vision-language models built to drive computer use agents. The release expands automation capabilities beyond web browsers and desktops […]

June 16, 2026

Hcompany Crafts Holo-3.1-35B-A3B for Private On-Device Screen Control

By vramkickedin

Holo-3.1-35B-A3B is the largest model in a new family of vision-language agents that can see, understand, and control computer interfaces across web browsers, desktops, and now mobile devices. It automates […]

June 15, 2026

Bernini-R Renderer Now Open: Turn AI Plans Into Photorealistic Video

By vramkickedin

ByteDance has released Bernini-R, the open-source renderer that powers its Bernini video generation and editing framework. The release provides inference code and model weights, letting users generate and edit videos […]

June 15, 2026

NVIDIA Drops Cosmos3-Super to Seed Entire Worlds from a Single Prompt

By vramkickedin

Cosmos3-Super is a new release from NVIDIA that generates video, images, audio, and even robot action plans from mixed inputs like text, photos, and video clips. It's an omnimodal world […]

June 15, 2026

Unsloth Shrinks Gemma 4 Into Local Gemma-4-12B-It-GGUF Package

By vramkickedin

Google DeepMind recently released Gemma 4, and the Gemma-4-12B-It-GGUF version from Unsloth makes the 12 billion parameter model ready for local use. This quantized format shrinks the model files dramatically, […]

About multimodal releases

Latest multimodal models