Trending Model:#1Unlimited-OCRbaidu⬇630kTrending Model:#2Qwythos-9B-Claude-Mythos-5-1M-GGUFempero-ai⬇1114kTrending Model:#3GLM-5.2zai-org⬇160kTrending Model:#4Ornith-1.0-35B-GGUFdeepreinforce-ai⬇234kTrending Model:#5Ornith-1.0-9B-GGUFdeepreinforce-ai⬇191kTrending Model:#6gemma-4-12B-agentic-fable5-composer2.5-v2-3.5x-tau2-GGUFyuxinlu1⬇289kTrending Model:#7Ornith-1.0-9Bdeepreinforce-ai⬇47kTrending Model:#8Qwen-AgentWorld-35B-A3BQwen⬇34kTrending Model:#9Ornith-1.0-35Bdeepreinforce-ai⬇135kTrending Model:#10Qwythos-9B-Claude-Mythos-5-1Mempero-ai⬇114kTrending Model:#1Unlimited-OCRbaidu⬇630kTrending Model:#2Qwythos-9B-Claude-Mythos-5-1M-GGUFempero-ai⬇1114kTrending Model:#3GLM-5.2zai-org⬇160kTrending Model:#4Ornith-1.0-35B-GGUFdeepreinforce-ai⬇234kTrending Model:#5Ornith-1.0-9B-GGUFdeepreinforce-ai⬇191kTrending Model:#6gemma-4-12B-agentic-fable5-composer2.5-v2-3.5x-tau2-GGUFyuxinlu1⬇289kTrending Model:#7Ornith-1.0-9Bdeepreinforce-ai⬇47kTrending Model:#8Qwen-AgentWorld-35B-A3BQwen⬇34kTrending Model:#9Ornith-1.0-35Bdeepreinforce-ai⬇135kTrending Model:#10Qwythos-9B-Claude-Mythos-5-1Mempero-ai⬇114k

Baidu Introduces Unlimited-OCR To Read Long Documents At Constant Speed

Thick stack of paper documents with glowing text lines if holographic glass texture and smooth metallic paper.

Baidu has introduced Unlimited-OCR, a new tool designed to read and transcribe long documents without losing speed. This model processes dozens of pages in a single pass by maintaining a constant memory size instead of slowing down as the text gets longer. It achieves this by using a special attention method that mimics how humans read and copy text efficiently over long periods.

The development team at Baidu built this system by modifying a baseline model and replacing its attention layers to keep memory usage flat. They also combined this new memory design with a highly compressed image encoder to handle large documents effectively. Additionally, the developers recently added support for faster processing frameworks to help users run the model more efficiently.

Features for document processing

Key Features
  • Reads dozens of pages in one pass.
  • Maintains constant memory during text generation.
  • Uses a highly compressed image encoder.
  • Supports standard processing length of thirty-two thousand.
  • Processes PDFs and multiple page images.
  • Applies to translation and audio transcription.

This tool is built for users who need to scan and convert massive amounts of text from images or PDFs. People working with extensive technical documents will benefit from the ability to process multiple pages without running out of memory. Anyone needing local text extraction can use the provided methods to run the model on their own hardware.

Project notes and development details

The developers note that while standard models slow down as output sequences lengthen, this new approach keeps the memory cache constant. They designed the attention mechanism to be general purpose, meaning it could also improve audio transcription and translation tasks in the future. Users can deploy the model using provided container images tailored for specific graphics card setups to ensure smooth operation.

"Unlimited OCR can transcribe dozens of pages of documents in a single forward pass under a standard maximum length of 32K." Source: Arxiv