convert Command¶

Overview¶

The convert command transforms documents into knowledge graphs using configurable extraction pipelines.

Key Features: - Multiple backend support (LLM/VLM) - Flexible processing modes - Configurable chunking - Multiple export formats - Batch processing support

Basic Usage¶

uv run docling-graph convert SOURCE --template TEMPLATE [OPTIONS]

Required Arguments¶

Argument	Description
`SOURCE`	Path to document (PDF, JPG, PNG, TXT, MD), URL, or DoclingDocument JSON
`--template`, `-t`	Dotted path to Pydantic template

Examples¶

# PDF document
uv run docling-graph convert invoice.pdf \
    --template "templates.BillingDocument"

# Text file
uv run docling-graph convert notes.txt \
    --template "templates.Report" \
    --backend llm

# URL
uv run docling-graph convert https://example.com/doc.pdf \
    --template "templates.BillingDocument"

# Markdown file
uv run docling-graph convert README.md \
    --template "templates.Documentation" \
    --backend llm

Core Options¶

Debug Mode¶

--debug

Enable debug mode to save all intermediate extraction artifacts for debugging and analysis.

When to use: - Debugging extraction issues - Analyzing extraction quality - Performance profiling - Development and testing

Output: All debug artifacts saved to outputs/{document}_{timestamp}/debug/

Example:

# Enable debug mode
uv run docling-graph convert document.pdf \
    --template "templates.BillingDocument" \
    --debug

# Debug artifacts will be in:
# outputs/document_pdf_20260206_094500/debug/

See Debug Mode Documentation for details on debug artifacts.

Backend Selection¶

--backend {llm|vlm}

LLM (Language Model): - Best for text-heavy documents - Supports chunking and consolidation - Works with local and remote providers

VLM (Vision-Language Model): - Best for forms and structured layouts - Processes images directly - Local inference only

Example:

# Use LLM backend
uv run docling-graph convert document.pdf \
    --template "templates.BillingDocument" \
    --backend llm

# Use VLM backend
uv run docling-graph convert form.jpg \
    --template "templates.IDCard" \
    --backend vlm

Inference Mode¶

--inference {local|remote}

Local: - Run models on your machine - Requires GPU for best performance - No API costs

Remote: - Use cloud API providers - Requires API key - Pay per request

Example:

# Local inference
uv run docling-graph convert document.pdf \
    --template "templates.BillingDocument" \
    --inference local

# Remote inference
uv run docling-graph convert document.pdf \
    --template "templates.BillingDocument" \
    --inference remote

Processing Mode¶

--processing-mode {one-to-one|many-to-one}

many-to-one (recommended): - Merge all pages into single graph - Better for multi-page documents - Enables consolidation

one-to-one: - Create separate graph per page - Better for independent pages - Faster processing

Example:

# Merge all pages
uv run docling-graph convert document.pdf \
    --template "templates.BillingDocument" \
    --processing-mode many-to-one

# Process pages separately
uv run docling-graph convert document.pdf \
    --template "templates.BillingDocument" \
    --processing-mode one-to-one

Model Configuration¶

Provider Override¶

--provider PROVIDER

Available providers: - Local: vllm, ollama - Remote: mistral, openai, gemini, watsonx

Model Override¶

--model MODEL

Example:

# Use specific model
uv run docling-graph convert document.pdf \
    --template "templates.BillingDocument" \
    --provider mistral \
    --model mistral-large-latest

Extraction Options¶

Chunking¶

--use-chunking / --no-use-chunking

Enable chunking for: - Large documents (>5 pages) - Documents exceeding context limits - Better extraction accuracy

Disable chunking for: - Small documents - When full context is needed - Faster processing

Example:

# Enable chunking (default)
uv run docling-graph convert large_doc.pdf \
    --template "templates.ScholarlyRheologyPaper" \
    --use-chunking

# Disable chunking
uv run docling-graph convert small_doc.pdf \
    --template "templates.BillingDocument" \
    --no-use-chunking

LLM Consolidation¶

--llm-consolidation / --no-llm-consolidation

Enable for: - Higher accuracy - Complex merging scenarios - When quality > speed

Disable for: - Faster processing - Lower API costs - Simple documents

Example:

# Enable LLM consolidation
uv run docling-graph convert document.pdf \
    --template "templates.ScholarlyRheologyPaper" \
    --llm-consolidation

# Disable (use programmatic merge)
uv run docling-graph convert document.pdf \
    --template "templates.BillingDocument" \
    --no-llm-consolidation

Docling Configuration¶

Pipeline Selection¶

--docling-pipeline {ocr|vision}

OCR Pipeline: - Traditional OCR approach - Most accurate for standard documents - Faster processing

Vision Pipeline: - Uses Granite-Docling VLM - Better for complex layouts - Handles tables and figures better

Example:

# Use OCR pipeline (default)
uv run docling-graph convert document.pdf \
    --template "templates.BillingDocument" \
    --docling-pipeline ocr

# Use vision pipeline
uv run docling-graph convert complex_doc.pdf \
    --template "templates.ScholarlyRheologyPaper" \
    --docling-pipeline vision

Export Options¶

Export Format¶

--export-format {csv|cypher}

CSV: - For Neo4j import - Separate nodes.csv and edges.csv - Easy to analyze

Cypher: - Direct Neo4j execution - Single .cypher file - Ready to import

Example:

# Export as CSV
uv run docling-graph convert document.pdf \
    --template "templates.BillingDocument" \
    --export-format csv

# Export as Cypher
uv run docling-graph convert document.pdf \
    --template "templates.BillingDocument" \
    --export-format cypher

Docling Exports¶

--export-docling-json / --no-docling-json
--export-markdown / --no-markdown
--export-per-page / --no-per-page

Example:

# Export all Docling outputs
uv run docling-graph convert document.pdf \
    --template "templates.BillingDocument" \
    --export-docling-json \
    --export-markdown \
    --export-per-page

# Minimal exports
uv run docling-graph convert document.pdf \
    --template "templates.BillingDocument" \
    --no-docling-json \
    --no-markdown \
    --no-per-page

Graph Options¶

Reverse Edges¶

--reverse-edges

Creates bidirectional relationships in the graph.

Example:

uv run docling-graph convert document.pdf \
    --template "templates.BillingDocument" \
    --reverse-edges

Output Options¶

Output Directory¶

--output-dir PATH

Default: outputs/

Example:

# Custom output directory
uv run docling-graph convert document.pdf \
    --template "templates.BillingDocument" \
    --output-dir "results/invoice_001"

# Organize by date
uv run docling-graph convert document.pdf \
    --template "templates.BillingDocument" \
    --output-dir "outputs/$(date +%Y-%m-%d)"

Complete Examples¶

📍 Simple Invoice (VLM)¶

uv run docling-graph convert invoice.jpg \
    --template "templates.BillingDocument" \
    --backend vlm \
    --processing-mode one-to-one \
    --output-dir "outputs/invoice"

📍 Rheology Research (Remote LLM)¶

export MISTRAL_API_KEY="your-key"

uv run docling-graph convert research.pdf \
    --template "templates.ScholarlyRheologyPaper" \
    --backend llm \
    --inference remote \
    --provider mistral \
    --model mistral-large-latest \
    --processing-mode many-to-one \
    --use-chunking \
    --llm-consolidation \
    --output-dir "outputs/research"

📍 Debug Mode Enabled¶

# Enable debug mode for troubleshooting
uv run docling-graph convert document.pdf \
    --template "templates.BillingDocument" \
    --backend llm \
    --debug \
    --output-dir "outputs/debug_run"

# Debug artifacts will be saved to:
# outputs/debug_run/document_pdf_20260206_094500/debug/

📍 Local Processing (Ollama)¶

# Start Ollama server first
ollama serve

uv run docling-graph convert document.pdf \
    --template "templates.BillingDocument" \
    --backend llm \
    --inference local \
    --provider ollama \
    --model llama3:8b \
    --processing-mode many-to-one \
    --use-chunking \
    --output-dir "outputs/local"

📍 Cypher Export for Neo4j¶

uv run docling-graph convert document.pdf \
    --template "templates.BillingDocument" \
    --backend llm \
    --inference remote \
    --export-format cypher \
    --output-dir "outputs/neo4j"

# Import to Neo4j
cat outputs/neo4j/graph.cypher | cypher-shell

📍 Minimal Processing¶

uv run docling-graph convert small_doc.pdf \
    --template "templates.BillingDocument" \
    --backend llm \
    --inference local \
    --no-use-chunking \
    --no-llm-consolidation \
    --no-docling-json \
    --no-markdown \
    --output-dir "outputs/minimal"

Batch Processing¶

Process Multiple Files¶

# Bash loop
for pdf in documents/*.pdf; do
    uv run docling-graph convert "$pdf" \
        --template "templates.BillingDocument" \
        --output-dir "outputs/$(basename $pdf .pdf)"
done

Parallel Processing¶

# Using GNU parallel
ls documents/*.pdf | parallel -j 4 \
    uv run docling-graph convert {} \
        --template "templates.BillingDocument" \
        --output-dir "outputs/{/.}"

Batch Script¶

#!/bin/bash
# batch_convert.sh

TEMPLATE="templates.BillingDocument"
INPUT_DIR="documents"
OUTPUT_BASE="outputs"

for file in "$INPUT_DIR"/*.pdf; do
    filename=$(basename "$file" .pdf)
    echo "Processing: $filename"

    uv run docling-graph convert "$file" \
        --template "$TEMPLATE" \
        --output-dir "$OUTPUT_BASE/$filename" \
        --backend llm \
        --inference remote

    echo "Completed: $filename"
done

Configuration Priority¶

Options are resolved in this order (highest to lowest):

Command-line arguments
config.yaml (from init)
Built-in defaults

Example¶

# config.yaml
defaults:
  backend: llm
  inference: local
  processing_mode: many-to-one

# This uses remote inference (CLI overrides config)
uv run docling-graph convert document.pdf \
    --template "templates.BillingDocument" \
    --inference remote

Output Structure¶

outputs/
├── metadata.json                # Pipeline metadata
├── docling/                     # Docling conversion output
│   ├── document.json            # Docling format
│   └── document.md              # Markdown export
└── docling_graph/               # Graph outputs
    ├── graph.json               # Complete graph
    ├── nodes.csv                # Node data
    ├── edges.csv                # Edge data
    ├── graph.html               # Interactive visualization
    └── report.md                # Summary report
    └── ...

Error Handling¶

Configuration Errors¶

[red]Configuration Error:[/red] Invalid backend type: 'invalid'

Solution: Use llm or vlm

Extraction Errors¶

[red]Extraction Error:[/red] Template not found: 'templates.Missing'

Solution: Check template path and ensure it's importable

API Errors¶

[red]Pipeline Error:[/red] API key not found for provider: mistral

Solution:

export MISTRAL_API_KEY="your-key"

Troubleshooting¶

🐛 Template Not Found¶

Error:

ModuleNotFoundError: No module named 'templates'

Solution:

# Ensure template is in Python path
export PYTHONPATH="${PYTHONPATH}:$(pwd)"

# Or use absolute path
uv run docling-graph convert document.pdf \
    --template "my_project.templates.BillingDocument"

🐛 Out of Memory¶

Error:

CUDA out of memory

Solution:

# Enable chunking
uv run docling-graph convert document.pdf \
    --template "templates.BillingDocument" \
    --use-chunking

# Or use smaller model
uv run docling-graph convert document.pdf \
    --template "templates.BillingDocument" \
    --model "ibm-granite/granite-4.0-1b"

🐛 Slow Processing¶

Solution:

# Disable LLM consolidation
uv run docling-graph convert document.pdf \
    --template "templates.BillingDocument" \
    --no-llm-consolidation

# Or disable chunking for small docs
uv run docling-graph convert document.pdf \
    --template "templates.BillingDocument" \
    --no-use-chunking

Best Practices¶

👍 Use Configuration Files¶

# ✅ Good - Reusable configuration
uv run docling-graph init
uv run docling-graph convert document.pdf -t "templates.BillingDocument"

# ❌ Avoid - Repeating options
uv run docling-graph convert document.pdf \
    --template "templates.BillingDocument" \
    --backend llm --inference remote --provider mistral

👍 Organize Outputs¶

# ✅ Good - Organized by document
uv run docling-graph convert invoice_001.pdf \
    --template "templates.BillingDocument" \
    --output-dir "outputs/invoice_001"

# ❌ Avoid - Overwriting outputs
uv run docling-graph convert invoice_001.pdf \
    --template "templates.BillingDocument"

👍 Use Appropriate Backend¶

# ✅ Good - VLM for forms
uv run docling-graph convert id_card.jpg \
    --template "templates.IDCard" \
    --backend vlm

# ✅ Good - LLM for documents
uv run docling-graph convert research.pdf \
    --template "templates.ScholarlyRheologyPaper" \
    --backend llm

Next Steps¶

inspect Command → - Visualize results
CLI Recipes → - Common patterns
Examples → - Real-world examples