Export Formats¶

Overview¶

Export formats determine how your knowledge graph is saved and shared. Docling Graph supports CSV, Cypher, and JSON formats, each optimized for different use cases.

In this guide: - CSV format (spreadsheets, analysis) - Cypher format (Neo4j import) - JSON format (programmatic access) - Format selection criteria - Integration examples

Format Comparison¶

Format	Best For	Output	Use Case
CSV	Analysis, spreadsheets	`nodes.csv`, `edges.csv`	Excel, Pandas, SQL
Cypher	Graph databases	`graph.cypher`	Neo4j import
JSON	APIs, processing	`graph.json`	Python, JavaScript

CSV Export¶

What is CSV Export?¶

CSV export creates separate files for nodes and edges in comma-separated format, perfect for spreadsheet analysis and SQL databases.

Configuration¶

from docling_graph import run_pipeline, PipelineConfig

config = PipelineConfig(
    source="document.pdf",
    template="templates.BillingDocument",
    export_format="csv",  # CSV export (default)
    output_dir="outputs"
)

run_pipeline(config)

Output Files¶

outputs/
├── nodes.csv          # All nodes with properties
├── edges.csv          # All edges with relationships
├── graph_stats.json   # Graph statistics
└── visualization.html # Interactive visualization

nodes.csv Format¶

id,label,type,__class__,invoice_number,total,name,street,city
invoice_001,Invoice,entity,Invoice,INV-001,1000,,,
org_acme,Organization,entity,Organization,,,Acme Corp,,
addr_123,Address,entity,Address,,,,123 Main St,Paris

Columns: - id: Unique node identifier - label: Node type/class - type: Always "entity" - __class__: Python class name - Additional columns for each property

edges.csv Format¶

source,target,label
invoice_001,org_acme,issued_by
org_acme,addr_123,located_at
invoice_001,item_001,contains_item

Columns: - source: Source node ID - target: Target node ID - label: Relationship type

Manual CSV Export¶

from docling_graph.core.exporters import CSVExporter
from docling_graph.core.converters import GraphConverter

# Convert models to graph
converter = GraphConverter()
graph, metadata = converter.pydantic_list_to_graph(models)

# Export to CSV
exporter = CSVExporter()
exporter.export(graph, output_dir="csv_output")

print("Exported to csv_output/nodes.csv and csv_output/edges.csv")

Using CSV with Pandas¶

import pandas as pd

# Load CSV files
nodes = pd.read_csv("outputs/nodes.csv")
edges = pd.read_csv("outputs/edges.csv")

# Analyze nodes
print(f"Total nodes: {len(nodes)}")
print(f"Node types:\n{nodes['label'].value_counts()}")

# Analyze edges
print(f"Total edges: {len(edges)}")
print(f"Edge types:\n{edges['label'].value_counts()}")

# Filter specific node type
invoices = nodes[nodes['label'] == 'Invoice']
print(f"Found {len(invoices)} invoices")

Using CSV with SQL¶

import sqlite3
import pandas as pd

# Load CSV
nodes = pd.read_csv("outputs/nodes.csv")
edges = pd.read_csv("outputs/edges.csv")

# Create database
conn = sqlite3.connect("graph.db")

# Import to SQL
nodes.to_sql("nodes", conn, if_exists="replace", index=False)
edges.to_sql("edges", conn, if_exists="replace", index=False)

# Query
result = pd.read_sql("""
    SELECT n.label, COUNT(*) as count
    FROM nodes n
    GROUP BY n.label
""", conn)

print(result)

Cypher Export¶

What is Cypher Export?¶

Cypher export generates Cypher statements for direct import into Neo4j graph databases.

Configuration¶

from docling_graph import run_pipeline, PipelineConfig

config = PipelineConfig(
    source="document.pdf",
    template="templates.BillingDocument",
    export_format="cypher",  # Cypher export
    output_dir="outputs"
)

run_pipeline(config)

Output Files¶

outputs/
├── graph.cypher       # Cypher statements
├── graph_stats.json   # Graph statistics
└── visualization.html # Interactive visualization

graph.cypher Format¶

// Cypher script generated by docling-graph
// Import this into Neo4j

// --- Create Nodes ---
CREATE (invoice_001:Invoice {invoice_number: "INV-001", total: 1000, node_id: "invoice_001"})
CREATE (org_acme:Organization {name: "Acme Corp", node_id: "org_acme"})
CREATE (addr_123:Address {street: "123 Main St", city: "Paris", node_id: "addr_123"})

// --- Create Relationships ---
MATCH (invoice_001), (org_acme)
CREATE (invoice_001)-[:ISSUED_BY]->(org_acme)

MATCH (org_acme), (addr_123)
CREATE (org_acme)-[:LOCATED_AT]->(addr_123)

Manual Cypher Export¶

from docling_graph.core.exporters import CypherExporter
from docling_graph.core.converters import GraphConverter
from pathlib import Path

# Convert models to graph
converter = GraphConverter()
graph, metadata = converter.pydantic_list_to_graph(models)

# Export to Cypher
exporter = CypherExporter()
exporter.export(graph, Path("outputs/graph.cypher"))

print("Exported to outputs/graph.cypher")

Importing to Neo4j¶

Method 1: cypher-shell¶

# Import using cypher-shell
cat outputs/graph.cypher | cypher-shell -u neo4j -p password

# Or with file
cypher-shell -u neo4j -p password -f outputs/graph.cypher

Method 2: Neo4j Browser¶

Open Neo4j Browser (http://localhost:7474)
Copy contents of graph.cypher
Paste into query editor
Execute

Method 3: Python Driver¶

from neo4j import GraphDatabase

# Connect to Neo4j
driver = GraphDatabase.driver(
    "bolt://localhost:7687",
    auth=("neo4j", "password")
)

# Read Cypher file
with open("outputs/graph.cypher") as f:
    cypher_script = f.read()

# Execute
with driver.session() as session:
    session.run(cypher_script)

driver.close()
print("Imported to Neo4j")

JSON Export¶

What is JSON Export?¶

JSON export is automatically generated alongside CSV or Cypher, providing structured data for programmatic access.

Output Files¶

outputs/
├── extracted_data.json  # Pydantic models
├── graph_data.json      # Graph structure
├── graph_stats.json     # Statistics
└── ...

extracted_data.json Format¶

{
  "models": [
    {
      "invoice_number": "INV-001",
      "total": 1000,
      "issued_by": {
        "name": "Acme Corp",
        "located_at": {
          "street": "123 Main St",
          "city": "Paris"
        }
      }
    }
  ]
}

graph_data.json Format¶

{
  "nodes": [
    {
      "id": "invoice_001",
      "label": "Invoice",
      "type": "entity",
      "properties": {
        "invoice_number": "INV-001",
        "total": 1000
      }
    },
    {
      "id": "org_acme",
      "label": "Organization",
      "type": "entity",
      "properties": {
        "name": "Acme Corp"
      }
    }
  ],
  "edges": [
    {
      "source": "invoice_001",
      "target": "org_acme",
      "label": "issued_by"
    }
  ]
}

Manual JSON Export¶

from docling_graph.core.exporters import JSONExporter
from docling_graph.core.converters import GraphConverter
from pathlib import Path

# Convert models to graph
converter = GraphConverter()
graph, metadata = converter.pydantic_list_to_graph(models)

# Export to JSON
exporter = JSONExporter()
exporter.export(graph, Path("outputs/graph.json"))

print("Exported to outputs/graph.json")

Using JSON in Python¶

import json

# Load graph data
with open("outputs/graph_data.json") as f:
    graph_data = json.load(f)

# Access nodes
for node in graph_data["nodes"]:
    print(f"{node['label']}: {node['id']}")

# Access edges
for edge in graph_data["edges"]:
    print(f"{edge['source']} --[{edge['label']}]--> {edge['target']}")

# Filter by type
invoices = [n for n in graph_data["nodes"] if n["label"] == "Invoice"]
print(f"Found {len(invoices)} invoices")

Format Selection¶

Decision Matrix¶

Use Case	Recommended Format	Reason
Excel analysis	CSV	Direct import to Excel
Neo4j database	Cypher	Direct import
Python processing	JSON	Easy to parse
SQL database	CSV	Standard import
Data science	CSV	Pandas compatible
API integration	JSON	Standard format
Graph queries	Cypher	Neo4j native

By Tool¶

Tool	Format	Import Method
Excel	CSV	File → Open
Neo4j	Cypher	cypher-shell
Python	JSON	json.load()
Pandas	CSV	pd.read_csv()
SQL	CSV	COPY/LOAD DATA
Power BI	CSV	Get Data
Tableau	CSV	Connect to File

Complete Examples¶

📍 CSV for Analysis¶

from docling_graph import run_pipeline, PipelineConfig
import pandas as pd

# Extract and export to CSV
config = PipelineConfig(
    source="invoices.pdf",
    template="templates.BillingDocument",
    export_format="csv",
    output_dir="analysis"
)

run_pipeline(config)

# Analyze with Pandas
nodes = pd.read_csv("analysis/nodes.csv")
edges = pd.read_csv("analysis/edges.csv")

# Calculate statistics
print(f"Total invoices: {len(nodes[nodes['label'] == 'Invoice'])}")
print(f"Total organizations: {len(nodes[nodes['label'] == 'Organization'])}")
print(f"Total relationships: {len(edges)}")

# Export summary
summary = nodes.groupby('label').size()
summary.to_csv("analysis/summary.csv")

📍 Cypher for Neo4j¶

from docling_graph import run_pipeline, PipelineConfig
import subprocess

# Extract and export to Cypher
config = PipelineConfig(
    source="contracts.pdf",
    template="templates.Contract",
    export_format="cypher",
    output_dir="neo4j_import"
)

run_pipeline(config)

# Import to Neo4j
result = subprocess.run([
    "cypher-shell",
    "-u", "neo4j",
    "-p", "password",
    "-f", "neo4j_import/graph.cypher"
], capture_output=True, text=True)

if result.returncode == 0:
    print("✅ Successfully imported to Neo4j")
else:
    print(f"❌ Import failed: {result.stderr}")

📍 JSON for API¶

from docling_graph import run_pipeline, PipelineConfig
import json
import requests

# Extract and export
config = PipelineConfig(
    source="document.pdf",
    template="templates.BillingDocument",
    export_format="csv",  # JSON always generated
    output_dir="api_data"
)

run_pipeline(config)

# Load JSON
with open("api_data/extracted_data.json") as f:
    data = json.load(f)

# Send to API
response = requests.post(
    "https://api.example.com/invoices",
    json=data,
    headers={"Content-Type": "application/json"}
)

print(f"API response: {response.status_code}")

Best Practices¶

👍 Choose Format by Use Case¶

# ✅ Good - Match format to use case
if use_case == "neo4j":
    export_format = "cypher"
elif use_case == "analysis":
    export_format = "csv"
else:
    export_format = "csv"  # Default

👍 Organize Output Directories¶

# ✅ Good - Structured outputs
from datetime import datetime

timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
output_dir = f"exports/{export_format}/{timestamp}"

config = PipelineConfig(
    source="document.pdf",
    template="templates.BillingDocument",
    export_format=export_format,
    output_dir=output_dir
)

👍 Validate Exports¶

# ✅ Good - Check exports exist
import os

run_pipeline(config)

if export_format == "csv":
    assert os.path.exists(f"{output_dir}/nodes.csv")
    assert os.path.exists(f"{output_dir}/edges.csv")
elif export_format == "cypher":
    assert os.path.exists(f"{output_dir}/graph.cypher")

print("✅ Exports validated")

Troubleshooting¶

🐛 Empty CSV Files¶

Solution:

# Check if graph has nodes
import json

with open("outputs/graph_stats.json") as f:
    stats = json.load(f)

if stats["node_count"] == 0:
    print("No nodes in graph - check extraction")

🐛 Cypher Import Fails¶

Solution:

# Check Cypher syntax
head -20 outputs/graph.cypher

# Test connection
cypher-shell -u neo4j -p password "RETURN 1"

# Import with error logging
cat outputs/graph.cypher | cypher-shell -u neo4j -p password 2>&1 | tee import.log

🐛 JSON Parsing Error¶

Solution:

# Validate JSON
import json

try:
    with open("outputs/graph_data.json") as f:
        data = json.load(f)
    print("✅ Valid JSON")
except json.JSONDecodeError as e:
    print(f"❌ Invalid JSON: {e}")

Next Steps¶

Now that you understand export formats:

Visualization → - Visualize your graphs
Neo4j Integration → - Deep dive into Neo4j
Graph Analysis → - Analyze graph structure