Examples¶
Overview¶
This section provides complete, end-to-end examples organized by both input format and domain/use case. Each example demonstrates how to process different types of documents through the Docling Graph pipeline.
What's Covered: - Complete Pydantic templates - CLI and Python API usage - Expected outputs and graph structures - Troubleshooting tips - Best practices
Quick Navigation¶
By Input Format¶
| Example | Input Type | Backend |
|---|---|---|
| Quickstart | PDF/Image | VLM/LLM |
| URL Input | URL | LLM |
| Markdown Input | Markdown | LLM |
| DoclingDocument Input | JSON | LLM |
By Domain¶
| Example | Domain | Input |
|---|---|---|
| Billing Document Extraction | Business | PDF/Image |
| ID Card | Identity | Image |
| Insurance Policy | Legal | |
| Rheology Research | Academic |
| Format | OCR Required | Processing Speed | Backend Support | Best For |
|---|---|---|---|---|
| ✅ Yes | 🐢 Slow | LLM + VLM | Scanned documents, forms | |
| Image | ✅ Yes | 🐢 Slow | LLM + VLM | Photos, scans |
| URL | Depends | ⚡ Variable | LLM + VLM | Remote documents |
| Markdown | ❌ No | ⚡ Fast | LLM only | Documentation, notes |
| DoclingDocument | ❌ No | ⚡ Very Fast | LLM only | Reprocessing, experimentation |
Choosing the Right Example¶
New to Docling Graph? → Quickstart
By Input Format: - Web documents → URL Input - Documentation → Markdown Input - Reprocessing → DoclingDocument Input
By Domain: - Business → Billing Document Extraction - Identity → ID Card - Legal → Insurance Policy - Academic → Rheology Research
Workflow 1: URL → Extract → Visualize¶
# Download and process in one step
uv run docling-graph convert "https://arxiv.org/pdf/2207.02720" \
--template "docs.examples.templates.rheology_research.ScholarlyRheologyPaper" \
--processing-mode "many-to-one"
# Visualize results
uv run docling-graph inspect outputs
Workflow 2: PDF → DoclingDocument → Reprocess¶
# Step 1: Initial processing with DoclingDocument export
uv run docling-graph convert billing_doc.pdf \
--template "templates.billing_document.BasicBillingDocument" \
--export-docling-json
# Step 2: Reprocess with different template (no OCR)
uv run docling-graph convert outputs/billing_doc_docling.json \
--template "templates.billing_document.DetailedBillingDocument"
Workflow 3: Batch Markdown Processing¶
# Process all markdown files
for file in docs/**/*.md; do
uv run docling-graph convert "$file" \
--template "templates.documentation.Documentation" \
--backend llm \
--output-dir "outputs/$(basename $file .md)"
done
Template Examples¶
Simple Entity¶
from pydantic import BaseModel, Field
class Person(BaseModel):
"""Person entity."""
model_config = {'is_entity': True, 'graph_id_fields': ['name']}
name: str = Field(description="Person's name")
With Relationships¶
from docling_graph.utils import edge
class Organization(BaseModel):
name: str
employees: list[Person] = edge("EMPLOYS")
See individual example pages for complete templates.
Additional Resources¶
Documentation¶
- Input Formats Guide - Complete input format reference
- Backend Selection - Choose LLM vs VLM
- Processing Modes - One-to-one vs many-to-one
API Reference¶
- PipelineConfig - Configuration options
- run_pipeline - Pipeline execution
- Batch Processing - Process multiple documents
Advanced Topics¶
- Performance Tuning - Optimize processing
- Error Handling - Handle failures gracefully
- Custom Backends - Extend functionality
Getting Help¶
Common Issues¶
"VLM backend does not support text-only inputs"
→ Use --backend llm for Markdown and text files
"URL download timeout" → Increase timeout or download manually first
"Text input is empty" → Check file content and encoding
"Invalid DoclingDocument schema"
→ Verify schema_name and version fields
Support¶
- Documentation: https://ibm.github.io/docling-graph
- GitHub Issues: https://github.com/IBM/docling-graph/issues
- Discussions: https://github.com/IBM/docling-graph/discussions
Next Steps¶
- Explore Input Formats - Learn about all supported formats
- Read Advanced Topics - Optimize your workflows