Which is the ocr engine that is opted by spacy-layout #25

Zaheer-10 · 2025-01-08T07:12:01Z

No description provided.

ines · 2025-02-23T10:14:05Z

spacy-layout uses Docling under the hood, which also takes care of the OCR. Also see the technical report paper for more details:

Docling provides optional support for OCR, for example to cover scanned PDFs or content in bitmaps images embedded on a page. In our initial release, we rely on EasyOCR [1], a popular thirdparty OCR library with support for many languages. Docling, by default, feeds a high-resolution page image (216 dpi) to the OCR engine, to allow capturing small print detail in decent quality. While EasyOCR delivers reasonable transcription quality, we observe that it runs fairly slow on CPU (upwards of 30 seconds per page).

We are actively seeking collaboration from the open-source community to extend Docling with
additional OCR backends and speed improvements.

ines added the docling Related to Docling library and models label Feb 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Which is the ocr engine that is opted by spacy-layout #25

Which is the ocr engine that is opted by spacy-layout #25

Zaheer-10 commented Jan 8, 2025

ines commented Feb 23, 2025

Which is the ocr engine that is opted by spacy-layout #25

Which is the ocr engine that is opted by spacy-layout #25

Comments

Zaheer-10 commented Jan 8, 2025

ines commented Feb 23, 2025