feat(ocr): add support for Docling OCR engine and language configuration

This commit adds support for configuring the OCR engine and language(s) for Docling.
Configuration can be set via the environment variables `DOCLING_OCR_ENGINE` and `DOCLING_OCR_LANG`, or through the UI.

Fixes #13133
This commit is contained in:
Athanasios Oikonomou
2025-05-03 00:31:00 +03:00
committed by Athanasios Oikonomou
parent 7d184c3a14
commit 657162e96d
5 changed files with 67 additions and 2 deletions

View File

@@ -1839,6 +1839,18 @@ DOCLING_SERVER_URL = PersistentConfig(
os.getenv("DOCLING_SERVER_URL", "http://docling:5001"),
)
DOCLING_OCR_ENGINE = PersistentConfig(
"DOCLING_OCR_ENGINE",
"rag.docling_ocr_engine",
os.getenv("DOCLING_OCR_ENGINE", "tesseract"),
)
DOCLING_OCR_LANG = PersistentConfig(
"DOCLING_OCR_LANG",
"rag.docling_ocr_lang",
os.getenv("DOCLING_OCR_LANG", "eng,fra,deu,spa"),
)
DOCUMENT_INTELLIGENCE_ENDPOINT = PersistentConfig(
"DOCUMENT_INTELLIGENCE_ENDPOINT",
"rag.document_intelligence_endpoint",