Skip to content

Examples

Overview

This section provides complete, end-to-end examples organized by both input format and domain/use case. Each example demonstrates how to process different types of documents through the Docling Graph pipeline.

What's Covered: - Complete Pydantic templates - CLI and Python API usage - Expected outputs and graph structures - Troubleshooting tips - Best practices



Quick Navigation

By Input Format

Example Input Type Backend
Quickstart PDF/Image VLM/LLM
URL Input URL LLM
Markdown Input Markdown LLM
DoclingDocument Input JSON LLM

By Domain

Example Domain Input
Billing Document Extraction Business PDF/Image
ID Card Identity Image
Insurance Policy Legal PDF
Rheology Research Academic PDF
Format OCR Required Processing Speed Backend Support Best For
PDF ✅ Yes 🐢 Slow LLM + VLM Scanned documents, forms
Image ✅ Yes 🐢 Slow LLM + VLM Photos, scans
URL Depends ⚡ Variable LLM + VLM Remote documents
Markdown ❌ No ⚡ Fast LLM only Documentation, notes
DoclingDocument ❌ No ⚡ Very Fast LLM only Reprocessing, experimentation

Choosing the Right Example

New to Docling Graph?Quickstart

By Input Format: - Web documents → URL Input - Documentation → Markdown Input - Reprocessing → DoclingDocument Input

By Domain: - Business → Billing Document Extraction - Identity → ID Card - Legal → Insurance Policy - Academic → Rheology Research

Workflow 1: URL → Extract → Visualize

# Download and process in one step
uv run docling-graph convert "https://arxiv.org/pdf/2207.02720" \
    --template "docs.examples.templates.rheology_research.ScholarlyRheologyPaper" \
    --processing-mode "many-to-one"

# Visualize results
uv run docling-graph inspect outputs

Workflow 2: PDF → DoclingDocument → Reprocess

# Step 1: Initial processing with DoclingDocument export
uv run docling-graph convert billing_doc.pdf \
    --template "templates.billing_document.BasicBillingDocument" \
    --export-docling-json

# Step 2: Reprocess with different template (no OCR)
uv run docling-graph convert outputs/billing_doc_docling.json \
    --template "templates.billing_document.DetailedBillingDocument"

Workflow 3: Batch Markdown Processing

# Process all markdown files
for file in docs/**/*.md; do
    uv run docling-graph convert "$file" \
        --template "templates.documentation.Documentation" \
        --backend llm \
        --output-dir "outputs/$(basename $file .md)"
done

Template Examples

Simple Entity

from pydantic import BaseModel, Field

class Person(BaseModel):
    """Person entity."""
    model_config = {'is_entity': True, 'graph_id_fields': ['name']}
    name: str = Field(description="Person's name")

With Relationships

from docling_graph.utils import edge

class Organization(BaseModel):
    name: str
    employees: list[Person] = edge("EMPLOYS")

See individual example pages for complete templates.

Additional Resources

Documentation

API Reference

Advanced Topics


Getting Help

Common Issues

"VLM backend does not support text-only inputs" → Use --backend llm for Markdown and text files

"URL download timeout" → Increase timeout or download manually first

"Text input is empty" → Check file content and encoding

"Invalid DoclingDocument schema" → Verify schema_name and version fields

Support


Next Steps

  1. Explore Input Formats - Learn about all supported formats
  2. Read Advanced Topics - Optimize your workflows