Ovis2.6-80B-A3B Lands Private Visual AI on a Single GPU

A translucent geometric sheep constructed from soft blue and lavender polygonal facets with tiny semi-transparent documents.

Ovis2.6-80B-A3B is a new multimodal AI that pairs vision and language through a mixture-of-experts design, keeping it fast and efficient. It can examine high-resolution images, long documents, and even videos, then answer questions about what it sees. The model opens a 64K token context window and supports images up to 2880×2880 pixels, so it handles complex, information-dense files without choking.

AIDC-AI, the team behind the Marco-Mini-Instruct model, built this release by upgrading the core language model to a mixture of experts. With 80 billion total parameters but only about 3 billion active during inference, the model stays light enough for a single consumer GPU. That lets professionals and serious hobbyists run high-end document and visual reasoning locally, keeping sensitive data on their own machines.

MoE backbone makes local inference practical

Key Features
  • 80B total parameters, only 3B active at inference.
  • 64K token context window handles long documents.
  • Images up to 2880×2880 pixels supported.
  • Think with Image enables visual chain-of-thought.
  • Boosted OCR, document, and chart capabilities.
  • Low serving cost and high token throughput.
  • Works with multi-image, video, and text inputs.

This model fits anyone running AI on a local GPU, since only the 3 billion active experts stay resident in VRAM. Small agencies can analyze invoices, contracts, or research papers without ever uploading files to the cloud. Privacy-conscious professionals get a capable document Q&A system that stays entirely inside their own hardware.

Usage notes for reliable outputs

To extract clean final answers during chain-of-thought, the developers recommend appending “End your response with ‘Final answer:’.” to prompts. The default transformers streamer doesn’t work with the model’s two-phase thinking process, so a budget-aware streamer helper is provided instead. The model is Apache 2.0 licensed, though AIDC-AI cautions that despite compliance filters it could still produce copyrighted text or improper content.

“Ovis2.6 upgrades the LLM backbone to a Mixture-of-Experts (MoE) architecture, delivering superior multimodal performance at a fraction of the serving cost.” — Source: Hugging Face