DeepSeek OCR 2: Advanced Optical Character Recognition Technology

DeepSeek AI has unveiled DeepSeek OCR 2, a sophisticated optical character recognition (OCR) system that introduces significant improvements in vision token processing and encoder capabilities. Meaning this model can actually see the images like we do.
The new model builds upon the previous DeepSeek OCR framework, implementing advanced technical innovations that enhance image text extraction and processing. The research team developed a multi-stage training approach that focuses on encoder pretraining, query enhancement, and decoder specialization to optimize performance.
Technical Architecture and Features
The DeepSeek OCR 2 leverages a unique attention mask architecture that combines bidirectional and causal attention mechanisms for visual token processing. Key features include:
- DeepEncoder V2 with Visual Causal Flow
- Two-stage reasoning loop
- Small 3B parameter size
- Document and layout handling
- Token and resolution strategy
- Multilingual support
- Multi-format input supporting pdfs, images and more
Training Methodology and Performance
DeepSeek AI implemented a comprehensive three-stage training pipeline to develop DeepSeek OCR 2. The training process involved:
- Encoder pretraining using language modeling objectives
- Query enhancement with unified data loading
- Continued LLM training with frozen encoder parameters
The research utilized a whopping 160 A100 GPUs across 20 nodes, processing approximately 100 million image-text pair samples. The training configuration enabled advanced feature extraction and token compression capabilities, with a focus on improving visual knowledge representation.
Model Specifications
DeepSeek OCR 2 maintains the previous model's decoder structure, utilizing a 3B-parameter Mixture of Experts (MoE) framework with approximately 500M active parameters. The model supports visual token processing with a maximum of 1120 tokens, which is comparable to other advanced vision-language models like Gemini-3-Pro.
Learn More About DeepSeek OCR 2
Explore more details about the project through these resources:
- Read the DeepSeek OCR 2 Research Paper
- Download the Model on the Hugging Face Page
- Download the Model from Unsloth
- View their Github Repository