Subgraph Extraction and Similarity Computation¶

This notebook demonstrates how to:

Extract subgraphs from the knowledge graph centered on any entity of indiscriminate type (AiSystem, Risk, RiskGroup, AiTask, Action, etc.)
Compute similarity between two subgraphs using multiple methods
Analyze and compare entity neighborhoods

Setup¶

In [1]:

Copied!





from ai_atlas_nexus import AIAtlasNexus
from ai_atlas_nexus.blocks.graph_explorer import (
    PyoxigraphExplorer,
    extract_subgraph,
    compute_similarity,
)
import pandas as pd
from collections import defaultdict
from pathlib import Path

# Initialize with sample data directory
SAMPLE_DATA_DIR = Path("sample_data")
nexus = AIAtlasNexus(base_dir=str(SAMPLE_DATA_DIR))
explorer = PyoxigraphExplorer(nexus._ontology)

print("✓ Knowledge graph loaded")
print(f"✓ Graph explorer initialized")
print(f"✓ Sample data from: {SAMPLE_DATA_DIR}")
from ai_atlas_nexus import AIAtlasNexus
from ai_atlas_nexus.blocks.graph_explorer import (
    PyoxigraphExplorer,
    extract_subgraph,
    compute_similarity,
)
import pandas as pd
from collections import defaultdict
from pathlib import Path

# Initialize with sample data directory
SAMPLE_DATA_DIR = Path("sample_data")
nexus = AIAtlasNexus(base_dir=str(SAMPLE_DATA_DIR))
explorer = PyoxigraphExplorer(nexus._ontology)

print("✓ Knowledge graph loaded")
print(f"✓ Graph explorer initialized")
print(f"✓ Sample data from: {SAMPLE_DATA_DIR}")

[2026-06-22 10:37:23:368] - INFO - AIAtlasNexus - Created AIAtlasNexus instance. Base_dir: sample_data

✓ Knowledge graph loaded
✓ Graph explorer initialized
✓ Sample data from: sample_data

1. Explore Available Entities¶

First, let's see what entity types are available and how many of each.

In [2]:

Copied!





# Count entities by type
type_counts = defaultdict(int)

for field in nexus._ontology.model_fields_set:
    items = getattr(nexus._ontology, field, None) or []
    if not isinstance(items, list):
        items = [items]
    for item in items:
        entity_type = type(item).__name__
        type_counts[entity_type] += 1


print(f"\nTotal entities: {sum(type_counts.values())}")
# Count entities by type
type_counts = defaultdict(int)

for field in nexus._ontology.model_fields_set:
    items = getattr(nexus._ontology, field, None) or []
    if not isinstance(items, list):
        items = [items]
    for item in items:
        entity_type = type(item).__name__
        type_counts[entity_type] += 1


print(f"\nTotal entities: {sum(type_counts.values())}")

Total entities: 2582

2. Extract a Subgraph¶

Let's extract a subgraph rooted at a specific entity. We'll look at an Action entity and its neighborhood.

In [3]:

Copied!





# Get some Action entities
action_ids = []
for item in (nexus._ontology.actions or []):
    if hasattr(item, "id"):
        action_ids.append(item.id)

print(f"Found {len(action_ids)} Action entities")

# Pick the first one
entity_id = action_ids[0]
entity = explorer.get_by_id("actions", entity_id)

print(f"\nSelected Action: {entity_id}")
print(f"  Name: {entity.name if hasattr(entity, 'name') else 'N/A'}")
print(f"  Description: {str(entity.description)[:100] if hasattr(entity, 'description') else 'N/A'}...")
# Get some Action entities
action_ids = []
for item in (nexus._ontology.actions or []):
    if hasattr(item, "id"):
        action_ids.append(item.id)

print(f"Found {len(action_ids)} Action entities")

# Pick the first one
entity_id = action_ids[0]
entity = explorer.get_by_id("actions", entity_id)

print(f"\nSelected Action: {entity_id}")
print(f"  Name: {entity.name if hasattr(entity, 'name') else 'N/A'}")
print(f"  Description: {str(entity.description)[:100] if hasattr(entity, 'description') else 'N/A'}...")

Found 254 Action entities

Selected Action: GV-1.1-001
  Name: GV-1.1-001
  Description: Align GAI development and use with applicable laws and regulations, including those related to data ...

2.1 Extract at Different Depths¶

Let's see how the subgraph grows as we increase the traversal depth.

In [4]:

Copied!





# Extract subgraphs at different depths
depths_to_explore = [0, 1, 2]
subgraphs = {}

print(f"Extracting subgraph rooted at {entity_id}\n")

for depth in depths_to_explore:
    sg = extract_subgraph(explorer, entity_id, max_hops=depth)
    subgraphs[depth] = sg
    
    # Count nodes
    total_nodes = sum(len(nodes) for nodes in sg.nodes.values())
    
    print(f"max_hops={depth}:")
    print(f"  Total nodes: {total_nodes}")
    print(f"  Nodes by type: {dict((k, len(v)) for k, v in sorted(sg.nodes.items()))}")
    print(f"  Edges: {len(sg.edges)}")
    print()
# Extract subgraphs at different depths
depths_to_explore = [0, 1, 2]
subgraphs = {}

print(f"Extracting subgraph rooted at {entity_id}\n")

for depth in depths_to_explore:
    sg = extract_subgraph(explorer, entity_id, max_hops=depth)
    subgraphs[depth] = sg
    
    # Count nodes
    total_nodes = sum(len(nodes) for nodes in sg.nodes.values())
    
    print(f"max_hops={depth}:")
    print(f"  Total nodes: {total_nodes}")
    print(f"  Nodes by type: {dict((k, len(v)) for k, v in sorted(sg.nodes.items()))}")
    print(f"  Edges: {len(sg.edges)}")
    print()

Extracting subgraph rooted at GV-1.1-001

max_hops=0:
  Total nodes: 1
  Nodes by type: {'Action': 1}
  Edges: 0

max_hops=1:
  Total nodes: 5
  Nodes by type: {'Action': 1, 'Documentation': 1, 'Risk': 2, 'RiskTaxonomy': 1}
  Edges: 4

max_hops=2:
  Total nodes: 74
  Nodes by type: {'Action': 42, 'Documentation': 1, 'Risk': 30, 'RiskTaxonomy': 1}
  Edges: 97

3. Inspect Subgraph Structure¶

Let's look at the edges and relationships in a subgraph.

In [5]:

Copied!





# Use the depth=1 subgraph
sg = subgraphs[1]

print(f"Edges in subgraph (max_hops=1):")
print(f"Format: (source_id, predicate, target_id)\n")

# Group edges by predicate
edges_by_pred = defaultdict(list)
for src, pred, tgt in sg.edges:
    edges_by_pred[pred].append((src, tgt))

for pred in sorted(edges_by_pred.keys()):
    edges = edges_by_pred[pred]
    print(f"  {pred}: {len(edges)} edge(s)")
    for src, tgt in edges[:3]:  # Show first 3
        print(f"    {src} -> {tgt}")
    if len(edges) > 3:
        print(f"    ... and {len(edges) - 3} more")
# Use the depth=1 subgraph
sg = subgraphs[1]

print(f"Edges in subgraph (max_hops=1):")
print(f"Format: (source_id, predicate, target_id)\n")

# Group edges by predicate
edges_by_pred = defaultdict(list)
for src, pred, tgt in sg.edges:
    edges_by_pred[pred].append((src, tgt))

for pred in sorted(edges_by_pred.keys()):
    edges = edges_by_pred[pred]
    print(f"  {pred}: {len(edges)} edge(s)")
    for src, tgt in edges[:3]:  # Show first 3
        print(f"    {src} -> {tgt}")
    if len(edges) > 3:
        print(f"    ... and {len(edges) - 3} more")

Edges in subgraph (max_hops=1):
Format: (source_id, predicate, target_id)

  hasDocumentation: 1 edge(s)
    GV-1.1-001 -> NIST.AI.600-1
  hasRelatedRisk: 2 edge(s)
    GV-1.1-001 -> nist-intellectual-property
    GV-1.1-001 -> nist-data-privacy
  isDefinedByTaxonomy: 1 edge(s)
    GV-1.1-001 -> nist-ai-rmf

4. Compare Two Subgraphs - Structural Similarity¶

Extract two subgraphs and compare them using structural similarity (Jaccard overlap).

In [6]:

Copied!





# Select two different entities
sg1_id = action_ids[0]
sg2_id = action_ids[1]

print(f"Entity 1: {sg1_id}")
print(f"Entity 2: {sg2_id}")

# Extract subgraphs
sg1 = extract_subgraph(explorer, sg1_id, max_hops=1)
sg2 = extract_subgraph(explorer, sg2_id, max_hops=1)

print(f"\nSubgraph 1:")
print(f"  Nodes by type: {dict((k, len(v)) for k, v in sorted(sg1.nodes.items()))}")
print(f"\nSubgraph 2:")
print(f"  Nodes by type: {dict((k, len(v)) for k, v in sorted(sg2.nodes.items()))}")
# Select two different entities
sg1_id = action_ids[0]
sg2_id = action_ids[1]

print(f"Entity 1: {sg1_id}")
print(f"Entity 2: {sg2_id}")

# Extract subgraphs
sg1 = extract_subgraph(explorer, sg1_id, max_hops=1)
sg2 = extract_subgraph(explorer, sg2_id, max_hops=1)

print(f"\nSubgraph 1:")
print(f"  Nodes by type: {dict((k, len(v)) for k, v in sorted(sg1.nodes.items()))}")
print(f"\nSubgraph 2:")
print(f"  Nodes by type: {dict((k, len(v)) for k, v in sorted(sg2.nodes.items()))}")

Entity 1: GV-1.1-001
Entity 2: GV-1.2-001

Subgraph 1:
  Nodes by type: {'Action': 1, 'Documentation': 1, 'Risk': 2, 'RiskTaxonomy': 1}

Subgraph 2:
  Nodes by type: {'Action': 1, 'Documentation': 1, 'Risk': 3, 'RiskTaxonomy': 1}

4.1 Compute Structural Similarity¶

In [7]:

Copied!





# Compute structural similarity (Jaccard)
result = compute_similarity(sg1, sg2, method="structural")

print(f"Structural Similarity: {result.score:.3f}\n")
print(f"Breakdown by entity type (Jaccard score):")

breakdown_df = pd.DataFrame(
    [(k, v) for k, v in sorted(result.breakdown.items(), key=lambda x: x[1], reverse=True)],
    columns=["Entity Type", "Jaccard Score"]
)
print(breakdown_df.to_string(index=False))

print(f"\nInterpretation:")
print(f"  - Score close to 1.0: Very similar neighborhoods")
print(f"  - Score close to 0.0: Disjoint neighborhoods")
print(f"  - Per-type breakdown shows which entity types contribute most to the difference")
# Compute structural similarity (Jaccard)
result = compute_similarity(sg1, sg2, method="structural")

print(f"Structural Similarity: {result.score:.3f}\n")
print(f"Breakdown by entity type (Jaccard score):")

breakdown_df = pd.DataFrame(
    [(k, v) for k, v in sorted(result.breakdown.items(), key=lambda x: x[1], reverse=True)],
    columns=["Entity Type", "Jaccard Score"]
)
print(breakdown_df.to_string(index=False))

print(f"\nInterpretation:")
print(f"  - Score close to 1.0: Very similar neighborhoods")
print(f"  - Score close to 0.0: Disjoint neighborhoods")
print(f"  - Per-type breakdown shows which entity types contribute most to the difference")

Structural Similarity: 0.667

Breakdown by entity type (Jaccard score):
  Entity Type  Jaccard Score
Documentation       1.000000
 RiskTaxonomy       1.000000
         Risk       0.666667
       Action       0.000000

Interpretation:
  - Score close to 1.0: Very similar neighborhoods
  - Score close to 0.0: Disjoint neighborhoods
  - Per-type breakdown shows which entity types contribute most to the difference

5. Compare Using Semantic Similarity¶

Compare the subgraphs using semantic similarity based on text embeddings.

In [8]:

Copied!





# Compute semantic similarity
try:
    result_semantic = compute_similarity(sg1, sg2, method="semantic")
    print(f"Semantic Similarity: {result_semantic.score:.3f}")
    print(f"\nInterpretation:")
    print(f"  Based on cosine distance of text embeddings (nli-mpnet-base-v2)")
    print(f"  Score close to 1.0: Similar textual meaning")
    print(f"  Score close to 0.0: Semantically different")
except ImportError as e:
    print(f"⚠ Semantic similarity requires txtai: {e}")
    result_semantic = None
# Compute semantic similarity
try:
    result_semantic = compute_similarity(sg1, sg2, method="semantic")
    print(f"Semantic Similarity: {result_semantic.score:.3f}")
    print(f"\nInterpretation:")
    print(f"  Based on cosine distance of text embeddings (nli-mpnet-base-v2)")
    print(f"  Score close to 1.0: Similar textual meaning")
    print(f"  Score close to 0.0: Semantically different")
except ImportError as e:
    print(f"⚠ Semantic similarity requires txtai: {e}")
    result_semantic = None

Loading weights:   0%|          | 0/199 [00:00<?, ?it/s]

MPNetModel LOAD REPORT from: sentence-transformers/nli-mpnet-base-v2
Key                     | Status     |  | 
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED:	can be ignored when loading from different task/architecture; not ok if you expect identical arch.

Semantic Similarity: 0.974

Interpretation:
  Based on cosine distance of text embeddings (nli-mpnet-base-v2)
  Score close to 1.0: Similar textual meaning
  Score close to 0.0: Semantically different

6. Compare Using Hybrid Similarity¶

Combine structural and semantic similarity with a weighted blend.

In [9]:

Copied!





# Compute hybrid similarity with different alpha values
alphas = [0.0, 0.25, 0.5, 0.75, 1.0]
results_hybrid = []

print(f"Hybrid Similarity (varying structural weight):\n")
print(f"Alpha | Structural | Semantic | Hybrid")
print(f"------|------------|----------|--------")

for alpha in alphas:
    try:
        result = compute_similarity(sg1, sg2, method="hybrid", alpha=alpha)
        results_hybrid.append(result)
        print(f"{alpha:5.2f} | {result.structural_score:10.3f} | {result.semantic_score:8.3f} | {result.score:6.3f}")
    except ImportError:
        print(f"⚠ Semantic component requires txtai")
        break

print(f"\nAlpha interpretation:")
print(f"  alpha=1.0: Pure structural (Jaccard)")
print(f"  alpha=0.5: Equal weight to structure and semantics")
print(f"  alpha=0.0: Pure semantic (embeddings)")
# Compute hybrid similarity with different alpha values
alphas = [0.0, 0.25, 0.5, 0.75, 1.0]
results_hybrid = []

print(f"Hybrid Similarity (varying structural weight):\n")
print(f"Alpha | Structural | Semantic | Hybrid")
print(f"------|------------|----------|--------")

for alpha in alphas:
    try:
        result = compute_similarity(sg1, sg2, method="hybrid", alpha=alpha)
        results_hybrid.append(result)
        print(f"{alpha:5.2f} | {result.structural_score:10.3f} | {result.semantic_score:8.3f} | {result.score:6.3f}")
    except ImportError:
        print(f"⚠ Semantic component requires txtai")
        break

print(f"\nAlpha interpretation:")
print(f"  alpha=1.0: Pure structural (Jaccard)")
print(f"  alpha=0.5: Equal weight to structure and semantics")
print(f"  alpha=0.0: Pure semantic (embeddings)")

Hybrid Similarity (varying structural weight):

Alpha | Structural | Semantic | Hybrid
------|------------|----------|--------

Loading weights:   0%|          | 0/199 [00:00<?, ?it/s]

MPNetModel LOAD REPORT from: sentence-transformers/nli-mpnet-base-v2
Key                     | Status     |  | 
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED:	can be ignored when loading from different task/architecture; not ok if you expect identical arch.

 0.00 |      0.667 |    0.974 |  0.974

Loading weights:   0%|          | 0/199 [00:00<?, ?it/s]

MPNetModel LOAD REPORT from: sentence-transformers/nli-mpnet-base-v2
Key                     | Status     |  | 
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED:	can be ignored when loading from different task/architecture; not ok if you expect identical arch.

 0.25 |      0.667 |    0.974 |  0.897

Loading weights:   0%|          | 0/199 [00:00<?, ?it/s]

MPNetModel LOAD REPORT from: sentence-transformers/nli-mpnet-base-v2
Key                     | Status     |  | 
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED:	can be ignored when loading from different task/architecture; not ok if you expect identical arch.

 0.50 |      0.667 |    0.974 |  0.820

Loading weights:   0%|          | 0/199 [00:00<?, ?it/s]

MPNetModel LOAD REPORT from: sentence-transformers/nli-mpnet-base-v2
Key                     | Status     |  | 
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED:	can be ignored when loading from different task/architecture; not ok if you expect identical arch.

 0.75 |      0.667 |    0.974 |  0.743

Loading weights:   0%|          | 0/199 [00:00<?, ?it/s]

MPNetModel LOAD REPORT from: sentence-transformers/nli-mpnet-base-v2
Key                     | Status     |  | 
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED:	can be ignored when loading from different task/architecture; not ok if you expect identical arch.

 1.00 |      0.667 |    0.974 |  0.667

Alpha interpretation:
  alpha=1.0: Pure structural (Jaccard)
  alpha=0.5: Equal weight to structure and semantics
  alpha=0.0: Pure semantic (embeddings)

7. Use Case: Comparing AI Systems¶

Compare two real-world AI hiring systems with different compliance requirements: one deployed in New York (subject to NYC Local Law 144) and one in the EU (subject to GDPR), loaded this use case data from a sample YAML file.

In [10]:

Copied!





# Use sample data loaded in the Setup section
# Extract AI system IDs from the loaded ontology

ai_system_ids = []
for item in (nexus._ontology.entries or []):
    if type(item).__name__ == "AiSystem" and hasattr(item, "id"):
        ai_system_ids.append(item.id)

print(f"Found {len(ai_system_ids)} AI System entities in sample data:")
for aid in ai_system_ids:
    print(f"  - {aid}")

hiring_systems = [aid for aid in ai_system_ids if "hiring" in aid.lower()]
if hiring_systems:
    print(f"\n✓ Hiring use case systems loaded: {hiring_systems}")
# Use sample data loaded in the Setup section
# Extract AI system IDs from the loaded ontology

ai_system_ids = []
for item in (nexus._ontology.entries or []):
    if type(item).__name__ == "AiSystem" and hasattr(item, "id"):
        ai_system_ids.append(item.id)

print(f"Found {len(ai_system_ids)} AI System entities in sample data:")
for aid in ai_system_ids:
    print(f"  - {aid}")

hiring_systems = [aid for aid in ai_system_ids if "hiring" in aid.lower()]
if hiring_systems:
    print(f"\n✓ Hiring use case systems loaded: {hiring_systems}")

Found 2 AI System entities in sample data:
  - hiring-usecase-ny
  - hiring-usecase-eu

✓ Hiring use case systems loaded: ['hiring-usecase-ny', 'hiring-usecase-eu']

7.1 Inspect the Hiring Use Case Systems¶

The hiring use case contains two AI systems:

hiring-usecase-ny: Algorithmic hiring system in New York (NYC Local Law 144)
hiring-usecase-eu: Algorithmic hiring system in EU (GDPR & EU AI Act)

In [11]:

Copied!





if len(ai_system_ids) >= 2:
    # Use the pre-loaded sample data
    sys1_id = ai_system_ids[0]  # hiring-usecase-ny
    sys2_id = ai_system_ids[1]  # hiring-usecase-eu
    
    sys1 = explorer.get_by_id(None, sys1_id)
    sys2 = explorer.get_by_id(None, sys2_id)
    
    print(f"System 1: {sys1_id}")
    if sys1:
        print(f"  Name: {getattr(sys1, 'name', 'N/A')}")
        print(f"  Description: {str(getattr(sys1, 'description', 'N/A'))[:100]}...")
        print(f"  Deployed in: {getattr(sys1, 'isUsedWithinLocality', 'N/A')}")
    
    print(f"\nSystem 2: {sys2_id}")
    if sys2:
        print(f"  Name: {getattr(sys2, 'name', 'N/A')}")
        print(f"  Description: {str(getattr(sys2, 'description', 'N/A'))[:100]}...")
        print(f"  Deployed in: {getattr(sys2, 'isUsedWithinLocality', 'N/A')}")
if len(ai_system_ids) >= 2:
    # Use the pre-loaded sample data
    sys1_id = ai_system_ids[0]  # hiring-usecase-ny
    sys2_id = ai_system_ids[1]  # hiring-usecase-eu
    
    sys1 = explorer.get_by_id(None, sys1_id)
    sys2 = explorer.get_by_id(None, sys2_id)
    
    print(f"System 1: {sys1_id}")
    if sys1:
        print(f"  Name: {getattr(sys1, 'name', 'N/A')}")
        print(f"  Description: {str(getattr(sys1, 'description', 'N/A'))[:100]}...")
        print(f"  Deployed in: {getattr(sys1, 'isUsedWithinLocality', 'N/A')}")
    
    print(f"\nSystem 2: {sys2_id}")
    if sys2:
        print(f"  Name: {getattr(sys2, 'name', 'N/A')}")
        print(f"  Description: {str(getattr(sys2, 'description', 'N/A'))[:100]}...")
        print(f"  Deployed in: {getattr(sys2, 'isUsedWithinLocality', 'N/A')}")

System 1: hiring-usecase-ny
  Name: Algorithmic Hiring (New York, US)
  Description: AI model for algorithmic hiring developed and verified in New York, US. This use case is governed by...
  Deployed in: ['hiring-usecase-locality-ny-ny-usa']

System 2: hiring-usecase-eu
  Name: Algorithmic Hiring (EU)
  Description: AI model for algorithmic hiring deployed in the European Union. This use case is governed by GDPR (A...
  Deployed in: ['hiring-usecase-locality-dublin-ie-eu']

7.2 Extract and Compare System Neighborhoods¶

In [12]:

Copied!





if len(ai_system_ids) >= 2:
    # Extract subgraphs with different traversal depths
    print(f"Extracting neighborhoods for hiring AI systems...\n")
    
    # 1-hop: direct relationships (risks, controls, etc.)
    sys1_sg_1hop = extract_subgraph(explorer, sys1_id, max_hops=1)
    sys2_sg_1hop = extract_subgraph(explorer, sys2_id, max_hops=1)
    
    print(f"NY System (1-hop neighborhood):")
    print(f"  Total nodes: {sum(len(v) for v in sys1_sg_1hop.nodes.values())}")
    print(f"  Risks: {len(sys1_sg_1hop.nodes.get('Risk', set()))}")
    print(f"  Controls: {len(sys1_sg_1hop.nodes.get('Action', set()))}")
    print(f"  Localities: {len(sys1_sg_1hop.nodes.get('LocalityOfUse', set()))}")
    print(f"  Edges: {len(sys1_sg_1hop.edges)}")
    
    print(f"\nEU System (1-hop neighborhood):")
    print(f"  Total nodes: {sum(len(v) for v in sys2_sg_1hop.nodes.values())}")
    print(f"  Risks: {len(sys2_sg_1hop.nodes.get('Risk', set()))}")
    print(f"  Controls: {len(sys2_sg_1hop.nodes.get('Action', set()))}")
    print(f"  Localities: {len(sys2_sg_1hop.nodes.get('LocalityOfUse', set()))}")
    print(f"  Edges: {len(sys2_sg_1hop.edges)}")
if len(ai_system_ids) >= 2:
    # Extract subgraphs with different traversal depths
    print(f"Extracting neighborhoods for hiring AI systems...\n")
    
    # 1-hop: direct relationships (risks, controls, etc.)
    sys1_sg_1hop = extract_subgraph(explorer, sys1_id, max_hops=1)
    sys2_sg_1hop = extract_subgraph(explorer, sys2_id, max_hops=1)
    
    print(f"NY System (1-hop neighborhood):")
    print(f"  Total nodes: {sum(len(v) for v in sys1_sg_1hop.nodes.values())}")
    print(f"  Risks: {len(sys1_sg_1hop.nodes.get('Risk', set()))}")
    print(f"  Controls: {len(sys1_sg_1hop.nodes.get('Action', set()))}")
    print(f"  Localities: {len(sys1_sg_1hop.nodes.get('LocalityOfUse', set()))}")
    print(f"  Edges: {len(sys1_sg_1hop.edges)}")
    
    print(f"\nEU System (1-hop neighborhood):")
    print(f"  Total nodes: {sum(len(v) for v in sys2_sg_1hop.nodes.values())}")
    print(f"  Risks: {len(sys2_sg_1hop.nodes.get('Risk', set()))}")
    print(f"  Controls: {len(sys2_sg_1hop.nodes.get('Action', set()))}")
    print(f"  Localities: {len(sys2_sg_1hop.nodes.get('LocalityOfUse', set()))}")
    print(f"  Edges: {len(sys2_sg_1hop.edges)}")

Extracting neighborhoods for hiring AI systems...

NY System (1-hop neighborhood):
  Total nodes: 14
  Risks: 5
  Controls: 0
  Localities: 1
  Edges: 13

EU System (1-hop neighborhood):
  Total nodes: 14
  Risks: 5
  Controls: 0
  Localities: 1
  Edges: 13

7.3 Detailed Risk Profile Comparison¶

Compare how the NY and EU hiring systems differ in their associated risks and how they're governed by different regulations.

In [13]:

Copied!





if len(ai_system_ids) >= 2:
    # Detailed node-by-node comparison
    print(f"Hiring Systems Risk Profile Comparison\n")
    print(f"{'Metric':<35} {'NY System':>15} {'EU System':>15}")
    print(f"{'-'*65}")
    
    all_types = set(sys1_sg_1hop.nodes.keys()) | set(sys2_sg_1hop.nodes.keys())
    
    for entity_type in sorted(all_types):
        nodes1 = sys1_sg_1hop.nodes.get(entity_type, set())
        nodes2 = sys2_sg_1hop.nodes.get(entity_type, set())
        shared = nodes1 & nodes2
        
        if len(nodes1) > 0 or len(nodes2) > 0:
            print(f"\n{entity_type}s:")
            print(f"  {'Total':<33} {len(nodes1):>15} {len(nodes2):>15}")
            print(f"  {'Shared':<33} {len(shared):>15}")
            
            only_1 = len(nodes1 - nodes2)
            only_2 = len(nodes2 - nodes1)
            if only_1 > 0 or only_2 > 0:
                print(f"  {'Unique to NY':<33} {only_1:>15}")
                print(f"  {'Unique to EU':<33} {only_2:>15}")
    
    print(f"\n\nEdges (Relationships):")
    print(f"  {'NY System':<33} {len(sys1_sg_1hop.edges):>15}")
    print(f"  {'EU System':<33} {len(sys2_sg_1hop.edges):>15}")
    
    # Show the specific risks
    ny_risks = sys1_sg_1hop.nodes.get('Risk', set())
    eu_risks = sys2_sg_1hop.nodes.get('Risk', set())
    if ny_risks or eu_risks:
        print(f"\n\nRisks in each system:")
        print(f"  NY risks: {ny_risks}")
        print(f"  EU risks: {eu_risks}")
if len(ai_system_ids) >= 2:
    # Detailed node-by-node comparison
    print(f"Hiring Systems Risk Profile Comparison\n")
    print(f"{'Metric':<35} {'NY System':>15} {'EU System':>15}")
    print(f"{'-'*65}")
    
    all_types = set(sys1_sg_1hop.nodes.keys()) | set(sys2_sg_1hop.nodes.keys())
    
    for entity_type in sorted(all_types):
        nodes1 = sys1_sg_1hop.nodes.get(entity_type, set())
        nodes2 = sys2_sg_1hop.nodes.get(entity_type, set())
        shared = nodes1 & nodes2
        
        if len(nodes1) > 0 or len(nodes2) > 0:
            print(f"\n{entity_type}s:")
            print(f"  {'Total':<33} {len(nodes1):>15} {len(nodes2):>15}")
            print(f"  {'Shared':<33} {len(shared):>15}")
            
            only_1 = len(nodes1 - nodes2)
            only_2 = len(nodes2 - nodes1)
            if only_1 > 0 or only_2 > 0:
                print(f"  {'Unique to NY':<33} {only_1:>15}")
                print(f"  {'Unique to EU':<33} {only_2:>15}")
    
    print(f"\n\nEdges (Relationships):")
    print(f"  {'NY System':<33} {len(sys1_sg_1hop.edges):>15}")
    print(f"  {'EU System':<33} {len(sys2_sg_1hop.edges):>15}")
    
    # Show the specific risks
    ny_risks = sys1_sg_1hop.nodes.get('Risk', set())
    eu_risks = sys2_sg_1hop.nodes.get('Risk', set())
    if ny_risks or eu_risks:
        print(f"\n\nRisks in each system:")
        print(f"  NY risks: {ny_risks}")
        print(f"  EU risks: {eu_risks}")

Hiring Systems Risk Profile Comparison

Metric                                    NY System       EU System
-----------------------------------------------------------------

AiSystems:
  Total                                           1               1
  Shared                                          0
  Unique to NY                                    1
  Unique to EU                                    1

AttributeConditionRules:
  Total                                           1               1
  Shared                                          0
  Unique to NY                                    1
  Unique to EU                                    1

Capabilitys:
  Total                                           1               1
  Shared                                          1

CapabilityGroups:
  Total                                           1               1
  Shared                                          1

LocalityOfUses:
  Total                                           1               1
  Shared                                          0
  Unique to NY                                    1
  Unique to EU                                    1

Purposes:
  Total                                           1               1
  Shared                                          1

Risks:
  Total                                           5               5
  Shared                                          5

RiskTaxonomys:
  Total                                           1               1
  Shared                                          1

Stakeholders:
  Total                                           2               2
  Shared                                          2


Edges (Relationships):
  NY System                                      13
  EU System                                      13


Risks in each system:
  NY risks: {'hiring-risk-discriminatory-actions', 'hiring-risk-unexplainable-output', 'hiring-risk-lack-of-transparency', 'hiring-risk-over-under-reliance', 'hiring-risk-unrepresentative-data'}
  EU risks: {'hiring-risk-discriminatory-actions', 'hiring-risk-unexplainable-output', 'hiring-risk-lack-of-transparency', 'hiring-risk-over-under-reliance', 'hiring-risk-unrepresentative-data'}

7.4 Similarity Analysis:¶

Compare the two systems across all similarity dimensions. The systems face the same risks but are governed by different regulations (NYC Law 144 vs GDPR).

In [14]:

Copied!





if len(ai_system_ids) >= 2:
    print(f"Hiring Systems Similarity Comparison (All Methods)\n")
    print(f"{'Method':<20} {'Score':<10} {'Interpretation':<50}")
    print(f"{'-'*80}")
    
    # Structural (Jaccard overlap on node sets)
    result_struct = compute_similarity(sys1_sg_1hop, sys2_sg_1hop, method="structural")
    print(f"{'Structural':<20} {result_struct.score:<10.3f} Risk/control overlap")
    
    # Semantic (text embeddings)
    try:
        result_semantic = compute_similarity(sys1_sg_1hop, sys2_sg_1hop, method="semantic")
        print(f"{'Semantic':<20} {result_semantic.score:<10.3f} Textual description similarity")
    except ImportError:
        print(f"{'Semantic':<20} {'N/A':<10} Requires txtai library")
        result_semantic = None
    
    # Hybrid (balanced)
    try:
        result_hybrid = compute_similarity(sys1_sg_1hop, sys2_sg_1hop, method="hybrid", alpha=0.5)
        print(f"{'Hybrid (50/50)':<20} {result_hybrid.score:<10.3f} Balanced structural + semantic")
    except ImportError:
        pass
    
    print(f"\nFindings:")
    print(f"  - Both systems address the SAME risks (transparency, discrimination, reliance, etc.)")
    print(f"  - High structural similarity expected since they target the same use case")
    print(f"  - Differences are in deployment locality and applicable regulations:")
    print(f"    • NY: NYC Local Law 144 (annual bias audits)")
    print(f"    • EU: GDPR Article 22 (right to explanation)")
    print(f"  - Semantic similarity shows textual description alignment")
    print(f"  - Hybrid score combines both perspectives for overall comparison")
if len(ai_system_ids) >= 2:
    print(f"Hiring Systems Similarity Comparison (All Methods)\n")
    print(f"{'Method':<20} {'Score':<10} {'Interpretation':<50}")
    print(f"{'-'*80}")
    
    # Structural (Jaccard overlap on node sets)
    result_struct = compute_similarity(sys1_sg_1hop, sys2_sg_1hop, method="structural")
    print(f"{'Structural':<20} {result_struct.score:<10.3f} Risk/control overlap")
    
    # Semantic (text embeddings)
    try:
        result_semantic = compute_similarity(sys1_sg_1hop, sys2_sg_1hop, method="semantic")
        print(f"{'Semantic':<20} {result_semantic.score:<10.3f} Textual description similarity")
    except ImportError:
        print(f"{'Semantic':<20} {'N/A':<10} Requires txtai library")
        result_semantic = None
    
    # Hybrid (balanced)
    try:
        result_hybrid = compute_similarity(sys1_sg_1hop, sys2_sg_1hop, method="hybrid", alpha=0.5)
        print(f"{'Hybrid (50/50)':<20} {result_hybrid.score:<10.3f} Balanced structural + semantic")
    except ImportError:
        pass
    
    print(f"\nFindings:")
    print(f"  - Both systems address the SAME risks (transparency, discrimination, reliance, etc.)")
    print(f"  - High structural similarity expected since they target the same use case")
    print(f"  - Differences are in deployment locality and applicable regulations:")
    print(f"    • NY: NYC Local Law 144 (annual bias audits)")
    print(f"    • EU: GDPR Article 22 (right to explanation)")
    print(f"  - Semantic similarity shows textual description alignment")
    print(f"  - Hybrid score combines both perspectives for overall comparison")

Hiring Systems Similarity Comparison (All Methods)

Method               Score      Interpretation                                    
--------------------------------------------------------------------------------
Structural           0.667      Risk/control overlap

Loading weights:   0%|          | 0/199 [00:00<?, ?it/s]

MPNetModel LOAD REPORT from: sentence-transformers/nli-mpnet-base-v2
Key                     | Status     |  | 
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED:	can be ignored when loading from different task/architecture; not ok if you expect identical arch.

Semantic             0.974      Textual description similarity

Loading weights:   0%|          | 0/199 [00:00<?, ?it/s]

MPNetModel LOAD REPORT from: sentence-transformers/nli-mpnet-base-v2
Key                     | Status     |  | 
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED:	can be ignored when loading from different task/architecture; not ok if you expect identical arch.

Hybrid (50/50)       0.820      Balanced structural + semantic

Findings:
  - Both systems address the SAME risks (transparency, discrimination, reliance, etc.)
  - High structural similarity expected since they target the same use case
  - Differences are in deployment locality and applicable regulations:
    • NY: NYC Local Law 144 (annual bias audits)
    • EU: GDPR Article 22 (right to explanation)
  - Semantic similarity shows textual description alignment
  - Hybrid score combines both perspectives for overall comparison

8. Compare Multiple Pairs¶

Build a similarity matrix comparing multiple entities.

In [15]:

Copied!





# Take a sample of actions
sample_ids = action_ids[:5]
sample_names = {}

for aid in sample_ids:
    entity = explorer.get_by_id("actions", aid)
    name = getattr(entity, "name", aid)
    sample_names[aid] = name[:30] if isinstance(name, str) else str(name)[:30]

print(f"Comparing {len(sample_ids)} entities")
print(f"\nEntity IDs and names:")
for aid, name in sample_names.items():
    print(f"  {aid}: {name}")
# Take a sample of actions
sample_ids = action_ids[:5]
sample_names = {}

for aid in sample_ids:
    entity = explorer.get_by_id("actions", aid)
    name = getattr(entity, "name", aid)
    sample_names[aid] = name[:30] if isinstance(name, str) else str(name)[:30]

print(f"Comparing {len(sample_ids)} entities")
print(f"\nEntity IDs and names:")
for aid, name in sample_names.items():
    print(f"  {aid}: {name}")

Comparing 5 entities

Entity IDs and names:
  GV-1.1-001: GV-1.1-001
  GV-1.2-001: GV-1.2-001
  GV-1.2-002: GV-1.2-002
  GV-1.3-001: GV-1.3-001
  GV-1.3-002: GV-1.3-002

8.1 Build Similarity Matrix¶

In [16]:

Copied!





# Extract subgraphs for all samples
subgraphs_sample = {aid: extract_subgraph(explorer, aid, max_hops=1) for aid in sample_ids}

# Build similarity matrix
similarity_matrix = []

for i, id1 in enumerate(sample_ids):
    row = []
    for j, id2 in enumerate(sample_ids):
        if i == j:
            score = 1.0  # Diagonal is always 1.0
        else:
            result = compute_similarity(subgraphs_sample[id1], subgraphs_sample[id2], method="structural")
            score = result.score
        row.append(score)
    similarity_matrix.append(row)

# Display as DataFrame
sim_df = pd.DataFrame(
    similarity_matrix,
    index=[f"{i}" for i in range(len(sample_ids))],
    columns=[f"{i}" for i in range(len(sample_ids))]
)

print("\nStructural Similarity Matrix (Structural Jaccard):")
print(sim_df.round(3).to_string())

print(f"\nRow/Column key:")
for i, aid in enumerate(sample_ids):
    print(f"  {i}: {sample_names[aid]}")
# Extract subgraphs for all samples
subgraphs_sample = {aid: extract_subgraph(explorer, aid, max_hops=1) for aid in sample_ids}

# Build similarity matrix
similarity_matrix = []

for i, id1 in enumerate(sample_ids):
    row = []
    for j, id2 in enumerate(sample_ids):
        if i == j:
            score = 1.0  # Diagonal is always 1.0
        else:
            result = compute_similarity(subgraphs_sample[id1], subgraphs_sample[id2], method="structural")
            score = result.score
        row.append(score)
    similarity_matrix.append(row)

# Display as DataFrame
sim_df = pd.DataFrame(
    similarity_matrix,
    index=[f"{i}" for i in range(len(sample_ids))],
    columns=[f"{i}" for i in range(len(sample_ids))]
)

print("\nStructural Similarity Matrix (Structural Jaccard):")
print(sim_df.round(3).to_string())

print(f"\nRow/Column key:")
for i, aid in enumerate(sample_ids):
    print(f"  {i}: {sample_names[aid]}")

Structural Similarity Matrix (Structural Jaccard):
       0      1      2      3      4
0  1.000  0.667  0.500  0.500  0.500
1  0.667  1.000  0.500  0.536  0.500
2  0.500  0.500  1.000  0.542  0.562
3  0.500  0.536  0.542  1.000  0.583
4  0.500  0.500  0.562  0.583  1.000

Row/Column key:
  0: GV-1.1-001
  1: GV-1.2-001
  2: GV-1.2-002
  3: GV-1.3-001
  4: GV-1.3-002

9. Use Case: Finding Similar Risks¶

Example: Find which risks are most similar to each other based on their neighborhoods.

In [17]:

Copied!





# Get some Risk entities (from concepts collection)
risk_ids = []
for item in (nexus.get_all("Risks") or []):
    if type(item).__name__ == "Risk" and hasattr(item, "id"):
        risk_ids.append(item.id)

print(f"Found {len(risk_ids)} Risk entities")

if len(risk_ids) >= 3:
    # Compare risks
    sample_risk_ids = risk_ids[:3]
    
    print(f"\nComparing first 3 risks:")
    for i, rid in enumerate(sample_risk_ids):
        entity = explorer.get_by_id(None, rid)
        if entity:
            name = getattr(entity, "name", rid)
            print(f"  {i}: {name}")
    
    # Pairwise comparison
    print(f"\nPairwise structural similarities:")
    for i in range(len(sample_risk_ids)):
        for j in range(i + 1, len(sample_risk_ids)):
            sg1 = extract_subgraph(explorer, sample_risk_ids[i], max_hops=1)
            sg2 = extract_subgraph(explorer, sample_risk_ids[j], max_hops=1)
            result = compute_similarity(sg1, sg2, method="structural")
            print(f"  Risk {i} vs Risk {j}: {result.score:.3f}")
else:
    print("Not enough Risk entities for demonstration")
# Get some Risk entities (from concepts collection)
risk_ids = []
for item in (nexus.get_all("Risks") or []):
    if type(item).__name__ == "Risk" and hasattr(item, "id"):
        risk_ids.append(item.id)

print(f"Found {len(risk_ids)} Risk entities")

if len(risk_ids) >= 3:
    # Compare risks
    sample_risk_ids = risk_ids[:3]
    
    print(f"\nComparing first 3 risks:")
    for i, rid in enumerate(sample_risk_ids):
        entity = explorer.get_by_id(None, rid)
        if entity:
            name = getattr(entity, "name", rid)
            print(f"  {i}: {name}")
    
    # Pairwise comparison
    print(f"\nPairwise structural similarities:")
    for i in range(len(sample_risk_ids)):
        for j in range(i + 1, len(sample_risk_ids)):
            sg1 = extract_subgraph(explorer, sample_risk_ids[i], max_hops=1)
            sg2 = extract_subgraph(explorer, sample_risk_ids[j], max_hops=1)
            result = compute_similarity(sg1, sg2, method="structural")
            print(f"  Risk {i} vs Risk {j}: {result.score:.3f}")
else:
    print("Not enough Risk entities for demonstration")

Found 561 Risk entities

Comparing first 3 risks:
  0: ASI01:2026 Agent Goal Hijack
  1: ASI02:2026 Tool Misuse & Exploitation
  2: ASI03:2026 Identity & Privilege Abuse

Pairwise structural similarities:
  Risk 0 vs Risk 1: 0.625
  Risk 0 vs Risk 2: 0.700
  Risk 1 vs Risk 2: 0.600

10. Advanced: Understanding the Subgraph Structure¶

Let's dig deeper into what makes two subgraphs similar or different.

In [18]:

Copied!





# Compare nodes across two subgraphs
sg1 = subgraphs_sample[sample_ids[0]]
sg2 = subgraphs_sample[sample_ids[1]]

print(f"Detailed comparison of subgraph 0 vs subgraph 1:")
print(f"\nNode set comparison by type:\n")

all_types = set(sg1.nodes.keys()) | set(sg2.nodes.keys())
for entity_type in sorted(all_types):
    nodes1 = sg1.nodes.get(entity_type, set())
    nodes2 = sg2.nodes.get(entity_type, set())
    
    shared = nodes1 & nodes2
    only_sg1 = nodes1 - nodes2
    only_sg2 = nodes2 - nodes1
    
    print(f"{entity_type}:")
    print(f"  Total in SG1: {len(nodes1)}")
    print(f"  Total in SG2: {len(nodes2)}")
    print(f"  Shared: {len(shared)}")
    print(f"  Only in SG1: {len(only_sg1)}")
    print(f"  Only in SG2: {len(only_sg2)}")
    if shared:
        print(f"  Shared IDs: {list(shared)[:3]}")
    print()
# Compare nodes across two subgraphs
sg1 = subgraphs_sample[sample_ids[0]]
sg2 = subgraphs_sample[sample_ids[1]]

print(f"Detailed comparison of subgraph 0 vs subgraph 1:")
print(f"\nNode set comparison by type:\n")

all_types = set(sg1.nodes.keys()) | set(sg2.nodes.keys())
for entity_type in sorted(all_types):
    nodes1 = sg1.nodes.get(entity_type, set())
    nodes2 = sg2.nodes.get(entity_type, set())
    
    shared = nodes1 & nodes2
    only_sg1 = nodes1 - nodes2
    only_sg2 = nodes2 - nodes1
    
    print(f"{entity_type}:")
    print(f"  Total in SG1: {len(nodes1)}")
    print(f"  Total in SG2: {len(nodes2)}")
    print(f"  Shared: {len(shared)}")
    print(f"  Only in SG1: {len(only_sg1)}")
    print(f"  Only in SG2: {len(only_sg2)}")
    if shared:
        print(f"  Shared IDs: {list(shared)[:3]}")
    print()

Detailed comparison of subgraph 0 vs subgraph 1:

Node set comparison by type:

Action:
  Total in SG1: 1
  Total in SG2: 1
  Shared: 0
  Only in SG1: 1
  Only in SG2: 1

Documentation:
  Total in SG1: 1
  Total in SG2: 1
  Shared: 1
  Only in SG1: 0
  Only in SG2: 0
  Shared IDs: ['NIST.AI.600-1']

Risk:
  Total in SG1: 2
  Total in SG2: 3
  Shared: 2
  Only in SG1: 0
  Only in SG2: 1
  Shared IDs: ['nist-data-privacy', 'nist-intellectual-property']

RiskTaxonomy:
  Total in SG1: 1
  Total in SG2: 1
  Shared: 1
  Only in SG1: 0
  Only in SG2: 0
  Shared IDs: ['nist-ai-rmf']

11. Summary¶

Key takeaways:

In [19]:

Copied!





print("""
=== Subgraph Extraction and Similarity - Summary ===

Functions available:
  1. extract_subgraph(explorer, entity_id, max_hops=2)
     - Works on any entity type (generic)
     - max_hops controls traversal depth
     - Returns SubGraph with nodes grouped by type and edges

  2. compute_similarity(sg1, sg2, method='structural', alpha=0.5)
     - method='structural': Jaccard overlap on node types (fast)
     - method='semantic': Cosine similarity on text embeddings (requires txtai)
     - method='hybrid': Weighted blend of both (configurable alpha)

Typical workflow:
  1. Initialize explorer: PyoxigraphExplorer(nexus._ontology)
  2. Extract subgraph: sg = extract_subgraph(explorer, entity_id)
  3. Compare: result = compute_similarity(sg1, sg2, method='structural')
  4. Analyze: result.score, result.breakdown, result.structural_score

Use cases:
  - Compare AI system risk profiles (Section 7)
  - Find similar risk entities
  - Detect duplicate or near-duplicate entities
  - Analyze entity neighborhoods
  - Build similarity matrices for clustering
""")
print("""
=== Subgraph Extraction and Similarity - Summary ===

Functions available:
  1. extract_subgraph(explorer, entity_id, max_hops=2)
     - Works on any entity type (generic)
     - max_hops controls traversal depth
     - Returns SubGraph with nodes grouped by type and edges

  2. compute_similarity(sg1, sg2, method='structural', alpha=0.5)
     - method='structural': Jaccard overlap on node types (fast)
     - method='semantic': Cosine similarity on text embeddings (requires txtai)
     - method='hybrid': Weighted blend of both (configurable alpha)

Typical workflow:
  1. Initialize explorer: PyoxigraphExplorer(nexus._ontology)
  2. Extract subgraph: sg = extract_subgraph(explorer, entity_id)
  3. Compare: result = compute_similarity(sg1, sg2, method='structural')
  4. Analyze: result.score, result.breakdown, result.structural_score

Use cases:
  - Compare AI system risk profiles (Section 7)
  - Find similar risk entities
  - Detect duplicate or near-duplicate entities
  - Analyze entity neighborhoods
  - Build similarity matrices for clustering
""")

=== Subgraph Extraction and Similarity - Summary ===

Functions available:
  1. extract_subgraph(explorer, entity_id, max_hops=2)
     - Works on any entity type (generic)
     - max_hops controls traversal depth
     - Returns SubGraph with nodes grouped by type and edges

  2. compute_similarity(sg1, sg2, method='structural', alpha=0.5)
     - method='structural': Jaccard overlap on node types (fast)
     - method='semantic': Cosine similarity on text embeddings (requires txtai)
     - method='hybrid': Weighted blend of both (configurable alpha)

Typical workflow:
  1. Initialize explorer: PyoxigraphExplorer(nexus._ontology)
  2. Extract subgraph: sg = extract_subgraph(explorer, entity_id)
  3. Compare: result = compute_similarity(sg1, sg2, method='structural')
  4. Analyze: result.score, result.breakdown, result.structural_score

Use cases:
  - Compare AI system risk profiles (Section 7)
  - Find similar risk entities
  - Detect duplicate or near-duplicate entities
  - Analyze entity neighborhoods
  - Build similarity matrices for clustering