Docling Settings¶
Overview¶
Docling settings control how documents are converted before extraction. Docling Graph uses the Docling library to convert PDFs and images into structured formats (markdown or JSON) that can be processed by LLMs or VLMs.
In this guide: - OCR vs Vision pipeline - Export options - Pipeline selection - Performance considerations - Troubleshooting
Docling Pipeline Types¶
Quick Comparison¶
| Aspect | OCR Pipeline | Vision Pipeline |
|---|---|---|
| Method | Traditional OCR | Vision-Language Model |
| Speed | Fast | Slower |
| Accuracy | Good for standard docs | Best for complex layouts |
| GPU Required | No | Yes |
| Best For | Text-heavy documents | Complex visual layouts |
| Default | Yes | No |
OCR Pipeline¶
What is OCR Pipeline?¶
The OCR pipeline uses traditional Optical Character Recognition to extract text from documents. It's fast, accurate for standard documents, and doesn't require a GPU.
Configuration¶
from docling_graph import PipelineConfig
config = PipelineConfig(
source="document.pdf",
template="templates.BillingDocument",
docling_config="ocr" # OCR pipeline (default)
)
How It Works¶
%%{init: {'theme': 'redux-dark', 'look': 'default', 'layout': 'elk'}}%%
flowchart LR
%% 1. Define Classes
classDef input fill:#E3F2FD,stroke:#90CAF9,color:#0D47A1
classDef config fill:#FFF8E1,stroke:#FFECB3,color:#5D4037
classDef output fill:#E8F5E9,stroke:#A5D6A7,color:#1B5E20
classDef decision fill:#FFE0B2,stroke:#FFB74D,color:#E65100
classDef data fill:#EDE7F6,stroke:#B39DDB,color:#4527A0
classDef operator fill:#F3E5F5,stroke:#CE93D8,color:#6A1B9A
classDef process fill:#ECEFF1,stroke:#B0BEC5,color:#263238
%% 2. Define Nodes
A@{ shape: terminal, label: "Image / PDF Document" }
B@{ shape: procs, label: "OCR Engine" }
C@{ shape: lin-proc, label: "Text Extraction" }
D@{ shape: lin-proc, label: "Layout Analysis" }
E@{ shape: doc, label: "Markdown" }
%% 3. Define Connections
A --> B
B --> C
C --> D
D --> E
%% 4. Apply Classes
class A input
class B,C,D process
class E output
When to Use OCR¶
✅ Use OCR when: - Documents are text-heavy - Layout is standard (invoices, contracts, reports) - Speed is important - GPU is not available - Documents are high-quality scans - Cost efficiency is a priority
❌ Don't use OCR when: - Documents have complex visual layouts - Tables have intricate structures - Handwriting needs processing - Images contain critical information - Document quality is poor
OCR Advantages¶
- Fast Processing
- Quick text extraction
- No GPU required
-
Efficient for batch processing
-
Good Accuracy
- Excellent for standard documents
- Reliable text extraction
-
Handles most layouts well
-
Low Resource Usage
- CPU-only processing
- Lower memory requirements
- No special hardware needed
OCR Limitations¶
- Layout Challenges
- May struggle with complex tables
- Can miss visual relationships
-
Limited understanding of structure
-
Quality Dependent
- Poor scans reduce accuracy
- Handwriting not well supported
- Image quality matters
Vision Pipeline¶
What is Vision Pipeline?¶
The Vision pipeline uses Vision-Language Models (VLMs) to understand documents visually. It processes layout, structure, and visual relationships alongside text.
Configuration¶
from docling_graph import PipelineConfig
config = PipelineConfig(
source="document.pdf",
template="templates.BillingDocument",
docling_config="vision" # Vision pipeline
)
How It Works¶
%%{init: {'theme': 'redux-dark', 'look': 'default', 'layout': 'elk'}}%%
flowchart LR
%% 1. Define Classes
classDef input fill:#E3F2FD,stroke:#90CAF9,color:#0D47A1
classDef config fill:#FFF8E1,stroke:#FFECB3,color:#5D4037
classDef output fill:#E8F5E9,stroke:#A5D6A7,color:#1B5E20
classDef decision fill:#FFE0B2,stroke:#FFB74D,color:#E65100
classDef data fill:#EDE7F6,stroke:#B39DDB,color:#4527A0
classDef operator fill:#F3E5F5,stroke:#CE93D8,color:#6A1B9A
classDef process fill:#ECEFF1,stroke:#B0BEC5,color:#263238
%% 2. Define Nodes
A@{ shape: terminal, label: "Images / PDF Document" }
B@{ shape: doc, label: "Page Images" }
C@{ shape: procs, label: "VLM Processing" }
D@{ shape: lin-proc, label: "Visual Understanding" }
E@{ shape: doc, label: "Structured Output" }
%% 3. Define Connections
A --> B
B --> C
C --> D
D --> E
%% 4. Apply Classes
class A input
class B data
class C,D process
class E output
When to Use Vision¶
✅ Use Vision when: - Documents have complex layouts - Tables have intricate structures - Visual relationships are important - Forms have specific patterns - Highest accuracy is required - GPU is available
❌ Don't use Vision when: - Documents are simple text - Speed is critical - GPU is not available - Cost is a major concern - Processing large batches
Vision Advantages¶
- Visual Understanding
- Processes layout and structure
- Understands visual relationships
- Handles complex tables
-
Better with forms
-
Higher Accuracy
- Best for complex documents
- Understands context visually
- Fewer extraction errors
-
Better table handling
-
Robust to Quality
- Handles poor scans better
- Works with handwriting
- Processes images directly
Vision Limitations¶
- Resource Intensive
- Requires GPU
- Higher memory usage
- Slower processing
-
More expensive hardware
-
Setup Complexity
- GPU drivers required
- Model downloads needed
- More configuration
Export Options¶
Docling Document Export¶
config = PipelineConfig(
source="document.pdf",
template="templates.BillingDocument",
# Docling export settings
export_docling=True, # Export Docling document (default)
export_docling_json=True, # Export as JSON (default)
export_markdown=True, # Export as markdown (default)
export_per_page_markdown=False # Export per-page markdown
)
Export Options Explained¶
1. export_docling¶
Controls whether to export the Docling document object.
Output: outputs/docling_document.pkl (Python pickle)
2. export_docling_json¶
Exports the full Docling document structure as JSON.
Output: outputs/docling_document.json
Contains: - Document metadata - Layout information - Tables and figures - Text content - Page structure
3. export_markdown¶
Exports the document as markdown (full document).
Output: outputs/document.md
Best for: - Human-readable output - Documentation - Text analysis - Debugging
4. export_per_page_markdown¶
Exports markdown for each page separately.
Output: outputs/pages/page_001.md, page_002.md, etc.
Best for: - Page-by-page analysis - One-to-one processing - Page-level debugging
Complete Configuration Examples¶
📍 OCR with Full Exports¶
config = PipelineConfig(
source="invoice.pdf",
template="templates.BillingDocument",
# OCR pipeline
docling_config="ocr",
# Export everything
export_docling=True,
export_docling_json=True,
export_markdown=True,
export_per_page_markdown=True
)
📍 Vision with Minimal Exports¶
config = PipelineConfig(
source="complex_form.pdf",
template="templates.Form",
# Vision pipeline
docling_config="vision",
# Minimal exports (save space)
export_docling=False,
export_docling_json=False,
export_markdown=False,
export_per_page_markdown=False
)
📍 OCR with Page-Level Exports¶
config = PipelineConfig(
source="batch_invoices.pdf",
template="templates.BillingDocument",
# OCR pipeline
docling_config="ocr",
# Page-level exports for one-to-one processing
processing_mode="one-to-one",
export_per_page_markdown=True,
export_markdown=False # Don't need full document
)
Pipeline Selection Strategy¶
By Document Type¶
| Document Type | Recommended Pipeline | Reason |
|---|---|---|
| Invoices | OCR | Standard layout, text-heavy |
| Contracts | OCR | Text-heavy, standard format |
| Rheology Researchs | OCR | Text-heavy, standard layout |
| Forms | Vision | Visual structure important |
| ID Cards | Vision | Visual layout critical |
| Complex Tables | Vision | Visual structure needed |
| Handwritten | Vision | Visual processing required |
| Mixed Content | Vision | Images and text combined |
By Quality¶
def get_docling_config(scan_quality: str):
"""Choose pipeline based on scan quality."""
if scan_quality == "high":
return "ocr" # OCR works well
elif scan_quality == "medium":
return "ocr" # OCR still acceptable
else:
return "vision" # Vision better for poor quality
By Infrastructure¶
def get_docling_config(has_gpu: bool):
"""Choose pipeline based on available hardware."""
if has_gpu:
return "vision" # Can use vision
else:
return "ocr" # Must use OCR
Performance Comparison¶
Processing Speed¶
Accuracy Comparison¶
Document Type: Complex invoice with tables
OCR Accuracy: 92% field extraction
Vision Accuracy: 97% field extraction
Document Type: Simple text contract
OCR Accuracy: 98% field extraction
Vision Accuracy: 96% field extraction
Resource Usage¶
OCR Pipeline:
- CPU: 50-70%
- Memory: 2-4GB
- GPU: Not required
Vision Pipeline:
- CPU: 30-40%
- Memory: 6-8GB
- GPU: Required (4-8GB VRAM)
Combining with Backend Settings¶
OCR + LLM Backend¶
# Most common combination
config = PipelineConfig(
source="document.pdf",
template="templates.BillingDocument",
# OCR for conversion
docling_config="ocr",
# LLM for extraction
backend="llm",
inference="remote"
)
Vision + VLM Backend¶
# Highest accuracy combination
config = PipelineConfig(
source="complex_document.pdf",
template="templates.Form",
# Vision for conversion
docling_config="vision",
# VLM for extraction
backend="vlm",
inference="local"
)
OCR + VLM Backend¶
# Mixed approach (less common)
config = PipelineConfig(
source="document.pdf",
template="templates.BillingDocument",
# OCR for conversion (faster)
docling_config="ocr",
# VLM for extraction (higher accuracy)
backend="vlm",
inference="local"
)
Troubleshooting¶
🐛 Poor OCR Quality¶
Symptoms: Missing text, garbled characters
Solutions:
# 1. Try vision pipeline
config = PipelineConfig(
source="poor_scan.pdf",
template="templates.BillingDocument",
docling_config="vision" # Better for poor quality
)
# 2. Pre-process document (external tool)
# - Increase resolution
# - Enhance contrast
# - Deskew pages
🐛 Vision Pipeline Too Slow¶
Symptoms: Long processing times
Solutions:
# 1. Use OCR if acceptable
config = PipelineConfig(
source="document.pdf",
template="templates.BillingDocument",
docling_config="ocr" # Faster
)
# 2. Process fewer pages
# 3. Use more powerful GPU
🐛 Missing Tables¶
Symptoms: Table data not extracted
Solutions:
# Use vision pipeline for better table handling
config = PipelineConfig(
source="document_with_tables.pdf",
template="templates.BillingDocument",
docling_config="vision" # Better table extraction
)
Best Practices¶
👍 Start with OCR¶
# ✅ Good - Start with faster option
config = PipelineConfig(
source="document.pdf",
template="templates.BillingDocument",
docling_config="ocr" # Try OCR first
)
# If accuracy insufficient, switch to vision
👍 Match Pipeline to Document¶
# ✅ Good - Choose based on document type
if document_has_complex_layout:
docling_config = "vision"
else:
docling_config = "ocr"
config = PipelineConfig(
source="document.pdf",
template="templates.BillingDocument",
docling_config=docling_config
)
👍 Enable Appropriate Exports¶
# ✅ Good - Export what you need
config = PipelineConfig(
source="document.pdf",
template="templates.BillingDocument",
docling_config="ocr",
# Enable useful exports
export_markdown=True, # For debugging
export_docling_json=False, # Don't need full structure
export_per_page_markdown=False # Not doing page-level
)
Next Steps¶
Now that you understand Docling settings:
- Export Configuration → - Configure output formats
- Configuration Examples - Complete scenarios
- Model Configuration - Model settings