API Reference¶
Overview¶
Complete API reference for docling-graph modules, classes, and functions.
What's Included: - Pipeline API - Configuration classes - Protocol definitions - Exception hierarchy - Converter classes - Extractor classes - Exporter classes - LLM client interfaces
Quick Links¶
Core APIs¶
Pipeline API
Main entry point for document processing.
run_pipeline()- Execute the pipeline- Pipeline stages and orchestration
Configuration API
Type-safe configuration classes.
PipelineConfig- Main configuration classModelConfig- Model configurationLLMConfig/VLMConfig- Backend configs
Protocols
Protocol definitions for type-safe interfaces.
ExtractionBackendProtocol- VLM backendsTextExtractionBackendProtocol- LLM backendsLLMClientProtocol- LLM clientsExtractorProtocol- Extraction strategies
Exceptions
Exception hierarchy and error handling.
DoclingGraphError- Base exceptionConfigurationError- Config errorsClientError- API errorsExtractionError- Extraction failuresValidationError- Data validationGraphError- Graph operationsPipelineError- Pipeline execution
Processing APIs¶
Converters
Graph conversion from Pydantic models.
GraphConverter- Convert models to graphsNodeIDRegistry- Stable node IDs- Graph construction utilities
Extractors
Document extraction strategies.
OneToOne- Per-page extractionManyToOne- Consolidated extraction- Backend implementations
- Chunking and batching
Exporters
Graph export formats.
CSVExporter- Neo4j-compatible CSVCypherExporter- Cypher scriptsJSONExporter- JSON formatDoclingExporter- Docling documents
LLM Clients
LiteLLM-backed client for all LLM calls.
LiteLLMClient- Provider-agnostic client
Module Structure¶
docling_graph/
├── __init__.py # Public API exports
├── pipeline.py # run_pipeline()
├── config.py # PipelineConfig
├── protocols.py # Protocol definitions
├── exceptions.py # Exception hierarchy
│
├── core/ # Core processing
│ ├── converters/ # Graph conversion
│ ├── extractors/ # Extraction strategies
│ ├── exporters/ # Export formats
│ └── visualizers/ # Visualization
│
├── llm_clients/ # LLM integrations
│ ├── base.py
│ ├── ollama.py
│ ├── mistral.py
│ ├── openai.py
│ ├── gemini.py
│ └── vllm.py
│
└── pipeline/ # Pipeline orchestration
├── context.py
├── stages.py
└── orchestrator.py
Import Patterns¶
Basic Imports¶
# Main API
from docling_graph import run_pipeline, PipelineConfig
# Configuration classes
from docling_graph import (
LLMConfig,
VLMConfig,
ModelConfig,
ModelsConfig
)
Advanced Imports¶
# Protocols
from docling_graph.protocols import (
ExtractionBackendProtocol,
TextExtractionBackendProtocol,
LLMClientProtocol
)
# Exceptions
from docling_graph.exceptions import (
DoclingGraphError,
ConfigurationError,
ClientError,
ExtractionError,
ValidationError,
GraphError,
PipelineError
)
# Converters
from docling_graph.core.converters import GraphConverter
# Extractors
from docling_graph.core.extractors import OneToOne, ManyToOne
# Exporters
from docling_graph.core.exporters import (
CSVExporter,
CypherExporter,
JSONExporter
)
Type Hints¶
Common Types¶
from typing import Any, Dict, List, Type, Union
from pathlib import Path
from pydantic import BaseModel
import networkx as nx
# Configuration
config: PipelineConfig
config_dict: Dict[str, Any]
# Templates
template: Type[BaseModel]
model_instance: BaseModel
models: List[BaseModel]
# Graphs
graph: nx.MultiDiGraph
# Paths
source: Union[str, Path]
output_dir: Path
Version Information¶
import docling_graph
# Get version
print(docling_graph.__version__) # e.g., "v1.2.0"
# Check available exports
print(docling_graph.__all__)
# ['run_pipeline', 'PipelineConfig', 'LLMConfig', ...]
API Stability¶
🟢 Stable APIs¶
These APIs are stable and safe to use:
run_pipeline()PipelineConfig- All configuration classes
- Exception hierarchy
- Public protocols
🟣 Internal APIs¶
These are internal and may change:
pipeline.orchestratorinternalscore.extractors.backendsinternalscore.utilsmodules
🟡 Experimental¶
These are experimental:
- Custom stage APIs
- Advanced pipeline customization
Deprecation Policy¶
Deprecated features will:
- Be marked with
@deprecateddecorator - Emit
DeprecationWarning - Be documented in CHANGELOG
- Be removed after 2 minor versions
Example:
import warnings
@deprecated("Use PipelineConfig instead")
def old_function():
warnings.warn(
"old_function is deprecated, use PipelineConfig",
DeprecationWarning,
stacklevel=2
)
API Design Principles¶
1. Type Safety¶
All public APIs use type hints:
def run_pipeline(config: Union[PipelineConfig, Dict[str, Any]]) -> None:
"""Type-safe function signature."""
pass
2. Pydantic Validation¶
Configuration uses Pydantic for validation:
config = PipelineConfig(
source="doc.pdf",
template="templates.MyTemplate",
backend="llm" # Validated at runtime
)
3. Protocol-Based¶
Extensibility through protocols:
4. Structured Exceptions¶
Clear error hierarchy:
try:
run_pipeline(config)
except ConfigurationError as e:
print(f"Config error: {e.message}")
print(f"Details: {e.details}")
Usage Examples¶
Basic Usage¶
from docling_graph import PipelineConfig
config = PipelineConfig(
source="document.pdf",
template="templates.MyTemplate",
backend="llm",
inference="local"
)
run_pipeline(config)
Advanced Usage¶
from docling_graph import run_pipeline
from docling_graph.exceptions import ExtractionError
config = {
"source": "document.pdf",
"template": "templates.MyTemplate",
"backend": "llm",
"inference": "remote",
"model_override": "mistral-small-latest",
"use_chunking": True,
"llm_consolidation": True,
"export_format": "cypher"
}
try:
run_pipeline(config)
except ExtractionError as e:
print(f"Extraction failed: {e}")
API Documentation Sections¶
- Pipeline API → - Main entry point
- Configuration API → - Configuration classes
- Protocols → - Protocol definitions
- Exceptions → - Exception hierarchy
- Converters → - Graph conversion
- Extractors → - Extraction strategies
- Exporters → - Export formats
- LLM Clients → - LLM integrations
Contributing¶
See Development Guide for:
- Adding new APIs
- API design guidelines
- Documentation standards
- Testing requirements