Kreuzberg v4.5.0 supercharges AI pipelines with layout detection

Kreuzberg v4.5.0 is a document intelligence framework that extracts text, structure, and metadata from over 88 file formats. Written in Rust, it provides native bindings for 12 programming languages including Python, TypeScript, PHP, Ruby, Java, and Go.
The open-source tool is designed for AI pipelines and large-scale document processing. This latest update adds document structure understanding, meaning it can now identify layouts and tables rather than just raw text.
Key features in the Kreuzberg update
- Document layout detection using RT-DETR v2 model with 17 element type classifications.
- Table extraction that reconstructs accurate markdown tables from detected regions.
- Multi-backend OCR pipeline with Tesseract and PaddleOCR v2 support.
- Automatic font CMap table repair to fix garbled PDF text.
- Extraction result caching for all file types.
- Tower service layer with configurable middleware for tracing, metrics, and timeouts.
Developers building AI document processing pipelines will find this useful for extracting structured data from PDFs, scanned documents, and office files without requiring Python as a dependency.
Performance benchmarks against Docling
The development team integrated Docling's layout model directly into a Rust-native pipeline. They benchmarked against Docling on 171 PDF documents including academic papers, legal documents, invoices, and OCR scans. Results show Kreuzberg achieved 42.1% structure F1 score compared to Docling's 41.7%, and 88.9% text F1 score versus 86.7%. Processing speed averaged 1,032 milliseconds per document, while Docling took 2,894 milliseconds.
The team noted,
'It's 2.8x faster on average, with a fraction of the memory overhead, and without Python as a dependency.'
Speed improvements come from Rust's native memory management, character-level pdfium text extraction, ONNX Runtime inference, and parallel processing across pages. Users already running Docling in production are encouraged to benchmark Kreuzberg against their current setup.
This release represents a significant step forward for document processing at scale. Get Kreuzberg v4.5.0 on GitHub.