Skip to content

Export Formats

Overview

Export formats determine how your knowledge graph is saved and shared. Docling Graph supports CSV, Cypher, and JSON formats, each optimized for different use cases.

In this guide: - CSV format (spreadsheets, analysis) - Cypher format (Neo4j import) - JSON format (programmatic access) - Format selection criteria - Integration examples


Format Comparison

Format Best For Output Use Case
CSV Analysis, spreadsheets nodes.csv, edges.csv Excel, Pandas, SQL
Cypher Graph databases graph.cypher Neo4j import
JSON APIs, processing graph.json Python, JavaScript

CSV Export

What is CSV Export?

CSV export creates separate files for nodes and edges in comma-separated format, perfect for spreadsheet analysis and SQL databases.

Configuration

from docling_graph import run_pipeline, PipelineConfig

config = PipelineConfig(
    source="document.pdf",
    template="templates.BillingDocument",
    export_format="csv",  # CSV export (default)
    output_dir="outputs"
)

run_pipeline(config)

Output Files

outputs/
├── nodes.csv          # All nodes with properties
├── edges.csv          # All edges with relationships
├── graph_stats.json   # Graph statistics
└── visualization.html # Interactive visualization

nodes.csv Format

id,label,type,__class__,invoice_number,total,name,street,city
invoice_001,Invoice,entity,Invoice,INV-001,1000,,,
org_acme,Organization,entity,Organization,,,Acme Corp,,
addr_123,Address,entity,Address,,,,123 Main St,Paris

Columns: - id: Unique node identifier - label: Node type/class - type: Always "entity" - __class__: Python class name - Additional columns for each property


edges.csv Format

source,target,label
invoice_001,org_acme,issued_by
org_acme,addr_123,located_at
invoice_001,item_001,contains_item

Columns: - source: Source node ID - target: Target node ID - label: Relationship type


Manual CSV Export

from docling_graph.core.exporters import CSVExporter
from docling_graph.core.converters import GraphConverter

# Convert models to graph
converter = GraphConverter()
graph, metadata = converter.pydantic_list_to_graph(models)

# Export to CSV
exporter = CSVExporter()
exporter.export(graph, output_dir="csv_output")

print("Exported to csv_output/nodes.csv and csv_output/edges.csv")

Using CSV with Pandas

import pandas as pd

# Load CSV files
nodes = pd.read_csv("outputs/nodes.csv")
edges = pd.read_csv("outputs/edges.csv")

# Analyze nodes
print(f"Total nodes: {len(nodes)}")
print(f"Node types:\n{nodes['label'].value_counts()}")

# Analyze edges
print(f"Total edges: {len(edges)}")
print(f"Edge types:\n{edges['label'].value_counts()}")

# Filter specific node type
invoices = nodes[nodes['label'] == 'Invoice']
print(f"Found {len(invoices)} invoices")

Using CSV with SQL

import sqlite3
import pandas as pd

# Load CSV
nodes = pd.read_csv("outputs/nodes.csv")
edges = pd.read_csv("outputs/edges.csv")

# Create database
conn = sqlite3.connect("graph.db")

# Import to SQL
nodes.to_sql("nodes", conn, if_exists="replace", index=False)
edges.to_sql("edges", conn, if_exists="replace", index=False)

# Query
result = pd.read_sql("""
    SELECT n.label, COUNT(*) as count
    FROM nodes n
    GROUP BY n.label
""", conn)

print(result)

Cypher Export

What is Cypher Export?

Cypher export generates Cypher statements for direct import into Neo4j graph databases.

Configuration

from docling_graph import run_pipeline, PipelineConfig

config = PipelineConfig(
    source="document.pdf",
    template="templates.BillingDocument",
    export_format="cypher",  # Cypher export
    output_dir="outputs"
)

run_pipeline(config)

Output Files

outputs/
├── graph.cypher       # Cypher statements
├── graph_stats.json   # Graph statistics
└── visualization.html # Interactive visualization

graph.cypher Format

// Cypher script generated by docling-graph
// Import this into Neo4j

// --- Create Nodes ---
CREATE (invoice_001:Invoice {invoice_number: "INV-001", total: 1000, node_id: "invoice_001"})
CREATE (org_acme:Organization {name: "Acme Corp", node_id: "org_acme"})
CREATE (addr_123:Address {street: "123 Main St", city: "Paris", node_id: "addr_123"})

// --- Create Relationships ---
MATCH (invoice_001), (org_acme)
CREATE (invoice_001)-[:ISSUED_BY]->(org_acme)

MATCH (org_acme), (addr_123)
CREATE (org_acme)-[:LOCATED_AT]->(addr_123)

Manual Cypher Export

from docling_graph.core.exporters import CypherExporter
from docling_graph.core.converters import GraphConverter
from pathlib import Path

# Convert models to graph
converter = GraphConverter()
graph, metadata = converter.pydantic_list_to_graph(models)

# Export to Cypher
exporter = CypherExporter()
exporter.export(graph, Path("outputs/graph.cypher"))

print("Exported to outputs/graph.cypher")

Importing to Neo4j

Method 1: cypher-shell

# Import using cypher-shell
cat outputs/graph.cypher | cypher-shell -u neo4j -p password

# Or with file
cypher-shell -u neo4j -p password -f outputs/graph.cypher

Method 2: Neo4j Browser

  1. Open Neo4j Browser (http://localhost:7474)
  2. Copy contents of graph.cypher
  3. Paste into query editor
  4. Execute

Method 3: Python Driver

from neo4j import GraphDatabase

# Connect to Neo4j
driver = GraphDatabase.driver(
    "bolt://localhost:7687",
    auth=("neo4j", "password")
)

# Read Cypher file
with open("outputs/graph.cypher") as f:
    cypher_script = f.read()

# Execute
with driver.session() as session:
    session.run(cypher_script)

driver.close()
print("Imported to Neo4j")

JSON Export

What is JSON Export?

JSON export is automatically generated alongside CSV or Cypher, providing structured data for programmatic access.

Output Files

outputs/
├── extracted_data.json  # Pydantic models
├── graph_data.json      # Graph structure
├── graph_stats.json     # Statistics
└── ...

extracted_data.json Format

{
  "models": [
    {
      "invoice_number": "INV-001",
      "total": 1000,
      "issued_by": {
        "name": "Acme Corp",
        "located_at": {
          "street": "123 Main St",
          "city": "Paris"
        }
      }
    }
  ]
}

graph_data.json Format

{
  "nodes": [
    {
      "id": "invoice_001",
      "label": "Invoice",
      "type": "entity",
      "properties": {
        "invoice_number": "INV-001",
        "total": 1000
      }
    },
    {
      "id": "org_acme",
      "label": "Organization",
      "type": "entity",
      "properties": {
        "name": "Acme Corp"
      }
    }
  ],
  "edges": [
    {
      "source": "invoice_001",
      "target": "org_acme",
      "label": "issued_by"
    }
  ]
}

Manual JSON Export

from docling_graph.core.exporters import JSONExporter
from docling_graph.core.converters import GraphConverter
from pathlib import Path

# Convert models to graph
converter = GraphConverter()
graph, metadata = converter.pydantic_list_to_graph(models)

# Export to JSON
exporter = JSONExporter()
exporter.export(graph, Path("outputs/graph.json"))

print("Exported to outputs/graph.json")

Using JSON in Python

import json

# Load graph data
with open("outputs/graph_data.json") as f:
    graph_data = json.load(f)

# Access nodes
for node in graph_data["nodes"]:
    print(f"{node['label']}: {node['id']}")

# Access edges
for edge in graph_data["edges"]:
    print(f"{edge['source']} --[{edge['label']}]--> {edge['target']}")

# Filter by type
invoices = [n for n in graph_data["nodes"] if n["label"] == "Invoice"]
print(f"Found {len(invoices)} invoices")

Format Selection

Decision Matrix

Use Case Recommended Format Reason
Excel analysis CSV Direct import to Excel
Neo4j database Cypher Direct import
Python processing JSON Easy to parse
SQL database CSV Standard import
Data science CSV Pandas compatible
API integration JSON Standard format
Graph queries Cypher Neo4j native

By Tool

Tool Format Import Method
Excel CSV File → Open
Neo4j Cypher cypher-shell
Python JSON json.load()
Pandas CSV pd.read_csv()
SQL CSV COPY/LOAD DATA
Power BI CSV Get Data
Tableau CSV Connect to File

Complete Examples

📍 CSV for Analysis

from docling_graph import run_pipeline, PipelineConfig
import pandas as pd

# Extract and export to CSV
config = PipelineConfig(
    source="invoices.pdf",
    template="templates.BillingDocument",
    export_format="csv",
    output_dir="analysis"
)

run_pipeline(config)

# Analyze with Pandas
nodes = pd.read_csv("analysis/nodes.csv")
edges = pd.read_csv("analysis/edges.csv")

# Calculate statistics
print(f"Total invoices: {len(nodes[nodes['label'] == 'Invoice'])}")
print(f"Total organizations: {len(nodes[nodes['label'] == 'Organization'])}")
print(f"Total relationships: {len(edges)}")

# Export summary
summary = nodes.groupby('label').size()
summary.to_csv("analysis/summary.csv")

📍 Cypher for Neo4j

from docling_graph import run_pipeline, PipelineConfig
import subprocess

# Extract and export to Cypher
config = PipelineConfig(
    source="contracts.pdf",
    template="templates.Contract",
    export_format="cypher",
    output_dir="neo4j_import"
)

run_pipeline(config)

# Import to Neo4j
result = subprocess.run([
    "cypher-shell",
    "-u", "neo4j",
    "-p", "password",
    "-f", "neo4j_import/graph.cypher"
], capture_output=True, text=True)

if result.returncode == 0:
    print("✅ Successfully imported to Neo4j")
else:
    print(f"❌ Import failed: {result.stderr}")

📍 JSON for API

from docling_graph import run_pipeline, PipelineConfig
import json
import requests

# Extract and export
config = PipelineConfig(
    source="document.pdf",
    template="templates.BillingDocument",
    export_format="csv",  # JSON always generated
    output_dir="api_data"
)

run_pipeline(config)

# Load JSON
with open("api_data/extracted_data.json") as f:
    data = json.load(f)

# Send to API
response = requests.post(
    "https://api.example.com/invoices",
    json=data,
    headers={"Content-Type": "application/json"}
)

print(f"API response: {response.status_code}")

Best Practices

👍 Choose Format by Use Case

# ✅ Good - Match format to use case
if use_case == "neo4j":
    export_format = "cypher"
elif use_case == "analysis":
    export_format = "csv"
else:
    export_format = "csv"  # Default

👍 Organize Output Directories

# ✅ Good - Structured outputs
from datetime import datetime

timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
output_dir = f"exports/{export_format}/{timestamp}"

config = PipelineConfig(
    source="document.pdf",
    template="templates.BillingDocument",
    export_format=export_format,
    output_dir=output_dir
)

👍 Validate Exports

# ✅ Good - Check exports exist
import os

run_pipeline(config)

if export_format == "csv":
    assert os.path.exists(f"{output_dir}/nodes.csv")
    assert os.path.exists(f"{output_dir}/edges.csv")
elif export_format == "cypher":
    assert os.path.exists(f"{output_dir}/graph.cypher")

print("✅ Exports validated")

Troubleshooting

🐛 Empty CSV Files

Solution:

# Check if graph has nodes
import json

with open("outputs/graph_stats.json") as f:
    stats = json.load(f)

if stats["node_count"] == 0:
    print("No nodes in graph - check extraction")

🐛 Cypher Import Fails

Solution:

# Check Cypher syntax
head -20 outputs/graph.cypher

# Test connection
cypher-shell -u neo4j -p password "RETURN 1"

# Import with error logging
cat outputs/graph.cypher | cypher-shell -u neo4j -p password 2>&1 | tee import.log

🐛 JSON Parsing Error

Solution:

# Validate JSON
import json

try:
    with open("outputs/graph_data.json") as f:
        data = json.load(f)
    print("✅ Valid JSON")
except json.JSONDecodeError as e:
    print(f"❌ Invalid JSON: {e}")


Next Steps

Now that you understand export formats:

  1. Visualization → - Visualize your graphs
  2. Neo4j Integration → - Deep dive into Neo4j
  3. Graph Analysis → - Analyze graph structure