Examples¶

Overview¶

This section provides complete, end-to-end examples organized by both input format and domain/use case. Each example demonstrates how to process different types of documents through the Docling Graph pipeline.

What's Covered: - Complete Pydantic templates - CLI and Python API usage - Expected outputs and graph structures - Troubleshooting tips - Best practices

By Input Format¶

Example	Input Type	Backend
Quickstart	PDF/Image	VLM/LLM
URL Input	URL	LLM
Markdown Input	Markdown	LLM
DoclingDocument Input	JSON	LLM

By Domain¶

Example	Domain	Input
Billing Document Extraction	Business	PDF/Image
ID Card	Identity	Image
Insurance Policy	Legal	PDF
Rheology Research	Academic	PDF

Format	OCR Required	Processing Speed	Backend Support	Best For
PDF	✅ Yes	🐢 Slow	LLM + VLM	Scanned documents, forms
Image	✅ Yes	🐢 Slow	LLM + VLM	Photos, scans
URL	Depends	⚡ Variable	LLM + VLM	Remote documents
Markdown	❌ No	⚡ Fast	LLM only	Documentation, notes
DoclingDocument	❌ No	⚡ Very Fast	LLM only	Reprocessing, experimentation

Choosing the Right Example¶

New to Docling Graph? → Quickstart

By Input Format: - Web documents → URL Input - Documentation → Markdown Input - Reprocessing → DoclingDocument Input

By Domain: - Business → Billing Document Extraction - Identity → ID Card - Legal → Insurance Policy - Academic → Rheology Research

Workflow 1: URL → Extract → Visualize¶

# Download and process in one step
uv run docling-graph convert "https://arxiv.org/pdf/2207.02720" \
    --template "docs.examples.templates.rheology_research.ScholarlyRheologyPaper" \
    --processing-mode "many-to-one"

# Visualize results
uv run docling-graph inspect outputs

Workflow 2: PDF → DoclingDocument → Reprocess¶

# Step 1: Initial processing with DoclingDocument export
uv run docling-graph convert billing_doc.pdf \
    --template "templates.billing_document.BasicBillingDocument" \
    --export-docling-json

# Step 2: Reprocess with different template (no OCR)
uv run docling-graph convert outputs/billing_doc_docling.json \
    --template "templates.billing_document.DetailedBillingDocument"

Workflow 3: Batch Markdown Processing¶

# Process all markdown files
for file in docs/**/*.md; do
    uv run docling-graph convert "$file" \
        --template "templates.documentation.Documentation" \
        --backend llm \
        --output-dir "outputs/$(basename $file .md)"
done

Template Examples¶

Simple Entity¶

from pydantic import BaseModel, Field

class Person(BaseModel):
    """Person entity."""
    model_config = {'is_entity': True, 'graph_id_fields': ['name']}
    name: str = Field(description="Person's name")

With Relationships¶

from docling_graph.utils import edge

class Organization(BaseModel):
    name: str
    employees: list[Person] = edge("EMPLOYS")

See individual example pages for complete templates.

Additional Resources¶

Documentation¶

Input Formats Guide - Complete input format reference
Backend Selection - Choose LLM vs VLM
Processing Modes - One-to-one vs many-to-one

API Reference¶

PipelineConfig - Configuration options
run_pipeline - Pipeline execution
Batch Processing - Process multiple documents

Advanced Topics¶

Performance Tuning - Optimize processing
Error Handling - Handle failures gracefully
Custom Backends - Extend functionality

Getting Help¶

Common Issues¶

"VLM backend does not support text-only inputs" → Use --backend llm for Markdown and text files

"URL download timeout" → Increase timeout or download manually first

"Text input is empty" → Check file content and encoding

"Invalid DoclingDocument schema" → Verify schema_name and version fields

Support¶

Documentation: https://ibm.github.io/docling-graph
GitHub Issues: https://github.com/IBM/docling-graph/issues
Discussions: https://github.com/IBM/docling-graph/discussions

Next Steps¶

Explore Input Formats - Learn about all supported formats
Read Advanced Topics - Optimize your workflows