convert Command¶
Overview¶
The convert command transforms documents into knowledge graphs using configurable extraction pipelines.
Key Features: - Multiple backend support (LLM/VLM) - Flexible processing modes - Configurable chunking - Multiple export formats - Batch processing support
Basic Usage¶
Required Arguments¶
| Argument | Description |
|---|---|
SOURCE |
Path to document (PDF, JPG, PNG, TXT, MD), URL, or DoclingDocument JSON |
--template, -t |
Dotted path to Pydantic template |
Examples¶
# PDF document
uv run docling-graph convert invoice.pdf \
--template "templates.BillingDocument"
# Text file
uv run docling-graph convert notes.txt \
--template "templates.Report" \
--backend llm
# URL
uv run docling-graph convert https://example.com/doc.pdf \
--template "templates.BillingDocument"
# Markdown file
uv run docling-graph convert README.md \
--template "templates.Documentation" \
--backend llm
Core Options¶
Debug Mode¶
Enable debug mode to save all intermediate extraction artifacts for debugging and analysis.
When to use: - Debugging extraction issues - Analyzing extraction quality - Performance profiling - Development and testing
Output: All debug artifacts saved to outputs/{document}_{timestamp}/debug/
Example:
# Enable debug mode
uv run docling-graph convert document.pdf \
--template "templates.BillingDocument" \
--debug
# Debug artifacts will be in:
# outputs/document_pdf_20260206_094500/debug/
See Debug Mode Documentation for details on debug artifacts.
Backend Selection¶
LLM (Language Model): - Best for text-heavy documents - Supports chunking and consolidation - Works with local and remote providers
VLM (Vision-Language Model): - Best for forms and structured layouts - Processes images directly - Local inference only
Example:
# Use LLM backend
uv run docling-graph convert document.pdf \
--template "templates.BillingDocument" \
--backend llm
# Use VLM backend
uv run docling-graph convert form.jpg \
--template "templates.IDCard" \
--backend vlm
Inference Mode¶
Local: - Run models on your machine - Requires GPU for best performance - No API costs
Remote: - Use cloud API providers - Requires API key - Pay per request
Example:
# Local inference
uv run docling-graph convert document.pdf \
--template "templates.BillingDocument" \
--inference local
# Remote inference
uv run docling-graph convert document.pdf \
--template "templates.BillingDocument" \
--inference remote
Processing Mode¶
many-to-one (recommended): - Merge all pages into single graph - Better for multi-page documents - Enables consolidation
one-to-one: - Create separate graph per page - Better for independent pages - Faster processing
Example:
# Merge all pages
uv run docling-graph convert document.pdf \
--template "templates.BillingDocument" \
--processing-mode many-to-one
# Process pages separately
uv run docling-graph convert document.pdf \
--template "templates.BillingDocument" \
--processing-mode one-to-one
Model Configuration¶
Provider Override¶
Available providers:
- Local: vllm, ollama
- Remote: mistral, openai, gemini, watsonx
Model Override¶
Example:
# Use specific model
uv run docling-graph convert document.pdf \
--template "templates.BillingDocument" \
--provider mistral \
--model mistral-large-latest
Extraction Options¶
Chunking¶
Enable chunking for: - Large documents (>5 pages) - Documents exceeding context limits - Better extraction accuracy
Disable chunking for: - Small documents - When full context is needed - Faster processing
Example:
# Enable chunking (default)
uv run docling-graph convert large_doc.pdf \
--template "templates.ScholarlyRheologyPaper" \
--use-chunking
# Disable chunking
uv run docling-graph convert small_doc.pdf \
--template "templates.BillingDocument" \
--no-use-chunking
LLM Consolidation¶
Enable for: - Higher accuracy - Complex merging scenarios - When quality > speed
Disable for: - Faster processing - Lower API costs - Simple documents
Example:
# Enable LLM consolidation
uv run docling-graph convert document.pdf \
--template "templates.ScholarlyRheologyPaper" \
--llm-consolidation
# Disable (use programmatic merge)
uv run docling-graph convert document.pdf \
--template "templates.BillingDocument" \
--no-llm-consolidation
Docling Configuration¶
Pipeline Selection¶
OCR Pipeline: - Traditional OCR approach - Most accurate for standard documents - Faster processing
Vision Pipeline: - Uses Granite-Docling VLM - Better for complex layouts - Handles tables and figures better
Example:
# Use OCR pipeline (default)
uv run docling-graph convert document.pdf \
--template "templates.BillingDocument" \
--docling-pipeline ocr
# Use vision pipeline
uv run docling-graph convert complex_doc.pdf \
--template "templates.ScholarlyRheologyPaper" \
--docling-pipeline vision
Export Options¶
Export Format¶
CSV: - For Neo4j import - Separate nodes.csv and edges.csv - Easy to analyze
Cypher: - Direct Neo4j execution - Single .cypher file - Ready to import
Example:
# Export as CSV
uv run docling-graph convert document.pdf \
--template "templates.BillingDocument" \
--export-format csv
# Export as Cypher
uv run docling-graph convert document.pdf \
--template "templates.BillingDocument" \
--export-format cypher
Docling Exports¶
--export-docling-json / --no-docling-json
--export-markdown / --no-markdown
--export-per-page / --no-per-page
Example:
# Export all Docling outputs
uv run docling-graph convert document.pdf \
--template "templates.BillingDocument" \
--export-docling-json \
--export-markdown \
--export-per-page
# Minimal exports
uv run docling-graph convert document.pdf \
--template "templates.BillingDocument" \
--no-docling-json \
--no-markdown \
--no-per-page
Graph Options¶
Reverse Edges¶
Creates bidirectional relationships in the graph.
Example:
uv run docling-graph convert document.pdf \
--template "templates.BillingDocument" \
--reverse-edges
Output Options¶
Output Directory¶
Default: outputs/
Example:
# Custom output directory
uv run docling-graph convert document.pdf \
--template "templates.BillingDocument" \
--output-dir "results/invoice_001"
# Organize by date
uv run docling-graph convert document.pdf \
--template "templates.BillingDocument" \
--output-dir "outputs/$(date +%Y-%m-%d)"
Complete Examples¶
📍 Simple Invoice (VLM)¶
uv run docling-graph convert invoice.jpg \
--template "templates.BillingDocument" \
--backend vlm \
--processing-mode one-to-one \
--output-dir "outputs/invoice"
📍 Rheology Research (Remote LLM)¶
export MISTRAL_API_KEY="your-key"
uv run docling-graph convert research.pdf \
--template "templates.ScholarlyRheologyPaper" \
--backend llm \
--inference remote \
--provider mistral \
--model mistral-large-latest \
--processing-mode many-to-one \
--use-chunking \
--llm-consolidation \
--output-dir "outputs/research"
📍 Debug Mode Enabled¶
# Enable debug mode for troubleshooting
uv run docling-graph convert document.pdf \
--template "templates.BillingDocument" \
--backend llm \
--debug \
--output-dir "outputs/debug_run"
# Debug artifacts will be saved to:
# outputs/debug_run/document_pdf_20260206_094500/debug/
📍 Local Processing (Ollama)¶
# Start Ollama server first
ollama serve
uv run docling-graph convert document.pdf \
--template "templates.BillingDocument" \
--backend llm \
--inference local \
--provider ollama \
--model llama3:8b \
--processing-mode many-to-one \
--use-chunking \
--output-dir "outputs/local"
📍 Cypher Export for Neo4j¶
uv run docling-graph convert document.pdf \
--template "templates.BillingDocument" \
--backend llm \
--inference remote \
--export-format cypher \
--output-dir "outputs/neo4j"
# Import to Neo4j
cat outputs/neo4j/graph.cypher | cypher-shell
📍 Minimal Processing¶
uv run docling-graph convert small_doc.pdf \
--template "templates.BillingDocument" \
--backend llm \
--inference local \
--no-use-chunking \
--no-llm-consolidation \
--no-docling-json \
--no-markdown \
--output-dir "outputs/minimal"
Batch Processing¶
Process Multiple Files¶
# Bash loop
for pdf in documents/*.pdf; do
uv run docling-graph convert "$pdf" \
--template "templates.BillingDocument" \
--output-dir "outputs/$(basename $pdf .pdf)"
done
Parallel Processing¶
# Using GNU parallel
ls documents/*.pdf | parallel -j 4 \
uv run docling-graph convert {} \
--template "templates.BillingDocument" \
--output-dir "outputs/{/.}"
Batch Script¶
#!/bin/bash
# batch_convert.sh
TEMPLATE="templates.BillingDocument"
INPUT_DIR="documents"
OUTPUT_BASE="outputs"
for file in "$INPUT_DIR"/*.pdf; do
filename=$(basename "$file" .pdf)
echo "Processing: $filename"
uv run docling-graph convert "$file" \
--template "$TEMPLATE" \
--output-dir "$OUTPUT_BASE/$filename" \
--backend llm \
--inference remote
echo "Completed: $filename"
done
Configuration Priority¶
Options are resolved in this order (highest to lowest):
- Command-line arguments
- config.yaml (from
init) - Built-in defaults
Example¶
# This uses remote inference (CLI overrides config)
uv run docling-graph convert document.pdf \
--template "templates.BillingDocument" \
--inference remote
Output Structure¶
outputs/
├── metadata.json # Pipeline metadata
├── docling/ # Docling conversion output
│ ├── document.json # Docling format
│ └── document.md # Markdown export
└── docling_graph/ # Graph outputs
├── graph.json # Complete graph
├── nodes.csv # Node data
├── edges.csv # Edge data
├── graph.html # Interactive visualization
└── report.md # Summary report
└── ...
Error Handling¶
Configuration Errors¶
Solution: Use llm or vlm
Extraction Errors¶
Solution: Check template path and ensure it's importable
API Errors¶
Solution:
Troubleshooting¶
🐛 Template Not Found¶
Error:
Solution:
# Ensure template is in Python path
export PYTHONPATH="${PYTHONPATH}:$(pwd)"
# Or use absolute path
uv run docling-graph convert document.pdf \
--template "my_project.templates.BillingDocument"
🐛 Out of Memory¶
Error:
Solution:
# Enable chunking
uv run docling-graph convert document.pdf \
--template "templates.BillingDocument" \
--use-chunking
# Or use smaller model
uv run docling-graph convert document.pdf \
--template "templates.BillingDocument" \
--model "ibm-granite/granite-4.0-1b"
🐛 Slow Processing¶
Solution:
# Disable LLM consolidation
uv run docling-graph convert document.pdf \
--template "templates.BillingDocument" \
--no-llm-consolidation
# Or disable chunking for small docs
uv run docling-graph convert document.pdf \
--template "templates.BillingDocument" \
--no-use-chunking
Best Practices¶
👍 Use Configuration Files¶
# ✅ Good - Reusable configuration
uv run docling-graph init
uv run docling-graph convert document.pdf -t "templates.BillingDocument"
# ❌ Avoid - Repeating options
uv run docling-graph convert document.pdf \
--template "templates.BillingDocument" \
--backend llm --inference remote --provider mistral
👍 Organize Outputs¶
# ✅ Good - Organized by document
uv run docling-graph convert invoice_001.pdf \
--template "templates.BillingDocument" \
--output-dir "outputs/invoice_001"
# ❌ Avoid - Overwriting outputs
uv run docling-graph convert invoice_001.pdf \
--template "templates.BillingDocument"
👍 Use Appropriate Backend¶
# ✅ Good - VLM for forms
uv run docling-graph convert id_card.jpg \
--template "templates.IDCard" \
--backend vlm
# ✅ Good - LLM for documents
uv run docling-graph convert research.pdf \
--template "templates.ScholarlyRheologyPaper" \
--backend llm
Next Steps¶
- inspect Command → - Visualize results
- CLI Recipes → - Common patterns
- Examples → - Real-world examples