Schema Definition: Pydantic Templates¶
Note: The examples in this document use simplified field names and structures for teaching purposes. The actual
BillingDocumentschema atdocs/examples/templates/billing_document.pyis more comprehensive with 30+ classes, EN 16931/Peppol BIS compliance, and usesCONTAINS_LINEfor line items.
Overview¶
Pydantic templates are the foundation of knowledge graph extraction in Docling Graph. They serve three critical purposes:
- LLM Guidance - Field descriptions and examples guide the language model to extract accurate, structured data
- Data Validation - Field validators ensure data quality and consistency
- Graph Structure - Models define nodes, edges, and relationships for the knowledge graph
What You'll Learn¶
This section provides a complete guide to creating Pydantic templates optimized for LLM-based document extraction and automatic conversion to knowledge graphs.
Quick Example¶
Here's a minimal template showing the key concepts:
"""BillingDocument extraction template."""
from typing import Any, List
from pydantic import BaseModel, ConfigDict, Field
# Required: Edge helper function
def edge(label: str, **kwargs: Any) -> Any:
"""Helper to create graph edges."""
return Field(..., json_schema_extra={"edge_label": label}, **kwargs)
# Component: Deduplicated by content
class Address(BaseModel):
"""Physical address (value object)."""
model_config = ConfigDict(is_entity=False)
street: str = Field(
description="Street name and number",
examples=["123 Main St", "45 Avenue des Champs-ΓlysΓ©es"]
)
city: str = Field(
description="City name",
examples=["Paris", "London"]
)
# Entity: Unique by name
class Organization(BaseModel):
"""Organization entity."""
model_config = ConfigDict(graph_id_fields=["name"])
name: str = Field(
description="Legal organization name",
examples=["Acme Corp", "Tech Solutions Ltd"]
)
# Edge to Address component
located_at: Address = edge(
label="LOCATED_AT",
description="Organization's physical address"
)
# Root document
class BillingDocument(BaseModel):
"""BillingDocument document."""
model_config = ConfigDict(graph_id_fields=["document_no"])
document_no: str = Field(
description="Unique invoice identifier",
examples=["INV-2024-001", "12345"]
)
# Edge to Organization entity
issued_by: Organization = edge(
label="ISSUED_BY",
description="Organization that issued this invoice"
)
Key Concepts Shown:
β
edge() helper function for relationships
β
Component with is_entity=False (Address)
β
Entity with graph_id_fields (Organization, Invoice)
β
Clear field descriptions and examples
β
Graph relationships via edge() calls
Why Pydantic for Knowledge Graphs?¶
1. Type Safety and Validation¶
Pydantic provides automatic type checking and validation:
class MonetaryAmount(BaseModel):
value: float = Field(...)
currency: str = Field(...)
@field_validator("value")
@classmethod
def validate_positive(cls, v: Any) -> Any:
if v < 0:
raise ValueError("Amount must be non-negative")
return v
2. LLM-Friendly Schema¶
Field descriptions and examples guide the LLM:
date_of_birth: date = Field(
description=(
"Person's date of birth. "
"Look for 'Date of birth', 'Date de naiss.', or 'Born on'. "
"Parse formats like 'DD MM YYYY' and normalize to YYYY-MM-DD."
),
examples=["1990-05-15", "1985-12-20"]
)
3. Automatic Graph Conversion¶
The pipeline automatically converts Pydantic models to knowledge graphs:
BillingDocument (node)
ββ ISSUED_BY β Organization (node)
β ββ LOCATED_AT β Address (node)
ββ SENT_TO β Client (node)
ββ LIVES_AT β Address (node)
Core Terminology¶
| Term | Definition | Example |
|---|---|---|
| Entity | Unique, identifiable object tracked individually | Person, Organization, Document |
| Component | Value object deduplicated by content | Address, MonetaryAmount, Measurement |
| Node | Any Pydantic model that becomes a graph node | All BaseModel subclasses |
| Edge | Relationship between nodes | ISSUED_BY, LOCATED_AT, CONTAINS_LINE |
| graph_id_fields | Fields used to create stable, unique node IDs | ["name"], ["first_name", "last_name"] |
Template Examples by Domain¶
Docling Graph includes production-ready templates for various domains:
π Invoice Template¶
- Entities: Invoice, Organization, Client
- Components: Address, LineItem
- Use Case: Financial document processing
π ID Card Template¶
- Entities: IDCard, Person
- Components: Address
- Use Case: Identity document extraction
π¬ Rheology Research Template¶
- Entities: Research, Experiment, Material
- Components: Measurement, VibrationParameter
- Use Case: Scientific literature mining
π₯ Insurance Template¶
- Entities: InsuranceTerms, InsurancePlan, Guarantee
- Components: MonetaryAmount, Address
- Use Case: Insurance document analysis
Location: docs/examples/templates/
Prerequisites¶
Before creating templates, ensure you have:
β
Python 3.10+ installed
β
Docling Graph installed (uv sync)
β
Basic Pydantic knowledge (recommended but not required)
β
Understanding of your domain (document types, entities, relationships)
Learning Path¶
Beginner Path (Start Here)¶
- Template Basics - Learn file structure and imports
- Entities vs Components - Understand the critical distinction
- Field Definitions - Master field descriptions and examples
- Best Practices - Follow the checklist
Advanced Path¶
- Relationships - Complex edge patterns
- Validation - Custom validators and normalization
- Advanced Patterns - Reusable components and complex structures
Common Questions¶
Q: Do I need to know Pydantic?¶
A: Basic knowledge helps, but this guide covers everything you need. Pydantic is intuitive and well-documented.
Q: Can I use existing Pydantic models?¶
A: Yes! Add graph_id_fields or is_entity=False to model_config, and use the edge() helper for relationships.
Q: How do I choose between Entity and Component?¶
A: Ask: "Should this be tracked individually?" If yes β Entity. If it's a shared value β Component. See Entities vs Components.
Q: What if my domain is complex?¶
A: Start simple with core entities, then add complexity. See Advanced Patterns for nested structures.
Next Steps¶
Ready to create your first template?
- Template Basics β - Learn the required structure
- Examples - See complete working examples
- Pipeline Configuration - Configure extraction after creating templates
Additional Resources¶
- Pydantic Documentation - Official Pydantic docs
- Example Templates - Production-ready templates
- API Reference - PipelineConfig and model details