Subgraph Extraction and Similarity Computation¶
This notebook demonstrates how to:
- Extract subgraphs from the knowledge graph centered on any entity of indiscriminate type (AiSystem, Risk, RiskGroup, AiTask, Action, etc.)
- Compute similarity between two subgraphs using multiple methods
- Analyze and compare entity neighborhoods
Setup¶
from ai_atlas_nexus import AIAtlasNexus
from ai_atlas_nexus.blocks.graph_explorer import (
PyoxigraphExplorer,
extract_subgraph,
compute_similarity,
)
import pandas as pd
from collections import defaultdict
from pathlib import Path
# Initialize with sample data directory
SAMPLE_DATA_DIR = Path("sample_data")
nexus = AIAtlasNexus(base_dir=str(SAMPLE_DATA_DIR))
explorer = PyoxigraphExplorer(nexus._ontology)
print("✓ Knowledge graph loaded")
print(f"✓ Graph explorer initialized")
print(f"✓ Sample data from: {SAMPLE_DATA_DIR}")
[2026-06-22 10:37:23:368] - INFO - AIAtlasNexus - Created AIAtlasNexus instance. Base_dir: sample_data
✓ Knowledge graph loaded ✓ Graph explorer initialized ✓ Sample data from: sample_data
1. Explore Available Entities¶
First, let's see what entity types are available and how many of each.
# Count entities by type
type_counts = defaultdict(int)
for field in nexus._ontology.model_fields_set:
items = getattr(nexus._ontology, field, None) or []
if not isinstance(items, list):
items = [items]
for item in items:
entity_type = type(item).__name__
type_counts[entity_type] += 1
print(f"\nTotal entities: {sum(type_counts.values())}")
Total entities: 2582
2. Extract a Subgraph¶
Let's extract a subgraph rooted at a specific entity. We'll look at an Action entity and its neighborhood.
# Get some Action entities
action_ids = []
for item in (nexus._ontology.actions or []):
if hasattr(item, "id"):
action_ids.append(item.id)
print(f"Found {len(action_ids)} Action entities")
# Pick the first one
entity_id = action_ids[0]
entity = explorer.get_by_id("actions", entity_id)
print(f"\nSelected Action: {entity_id}")
print(f" Name: {entity.name if hasattr(entity, 'name') else 'N/A'}")
print(f" Description: {str(entity.description)[:100] if hasattr(entity, 'description') else 'N/A'}...")
Found 254 Action entities Selected Action: GV-1.1-001 Name: GV-1.1-001 Description: Align GAI development and use with applicable laws and regulations, including those related to data ...
2.1 Extract at Different Depths¶
Let's see how the subgraph grows as we increase the traversal depth.
# Extract subgraphs at different depths
depths_to_explore = [0, 1, 2]
subgraphs = {}
print(f"Extracting subgraph rooted at {entity_id}\n")
for depth in depths_to_explore:
sg = extract_subgraph(explorer, entity_id, max_hops=depth)
subgraphs[depth] = sg
# Count nodes
total_nodes = sum(len(nodes) for nodes in sg.nodes.values())
print(f"max_hops={depth}:")
print(f" Total nodes: {total_nodes}")
print(f" Nodes by type: {dict((k, len(v)) for k, v in sorted(sg.nodes.items()))}")
print(f" Edges: {len(sg.edges)}")
print()
Extracting subgraph rooted at GV-1.1-001
max_hops=0:
Total nodes: 1
Nodes by type: {'Action': 1}
Edges: 0
max_hops=1:
Total nodes: 5
Nodes by type: {'Action': 1, 'Documentation': 1, 'Risk': 2, 'RiskTaxonomy': 1}
Edges: 4
max_hops=2:
Total nodes: 74
Nodes by type: {'Action': 42, 'Documentation': 1, 'Risk': 30, 'RiskTaxonomy': 1}
Edges: 97
3. Inspect Subgraph Structure¶
Let's look at the edges and relationships in a subgraph.
# Use the depth=1 subgraph
sg = subgraphs[1]
print(f"Edges in subgraph (max_hops=1):")
print(f"Format: (source_id, predicate, target_id)\n")
# Group edges by predicate
edges_by_pred = defaultdict(list)
for src, pred, tgt in sg.edges:
edges_by_pred[pred].append((src, tgt))
for pred in sorted(edges_by_pred.keys()):
edges = edges_by_pred[pred]
print(f" {pred}: {len(edges)} edge(s)")
for src, tgt in edges[:3]: # Show first 3
print(f" {src} -> {tgt}")
if len(edges) > 3:
print(f" ... and {len(edges) - 3} more")
Edges in subgraph (max_hops=1):
Format: (source_id, predicate, target_id)
hasDocumentation: 1 edge(s)
GV-1.1-001 -> NIST.AI.600-1
hasRelatedRisk: 2 edge(s)
GV-1.1-001 -> nist-intellectual-property
GV-1.1-001 -> nist-data-privacy
isDefinedByTaxonomy: 1 edge(s)
GV-1.1-001 -> nist-ai-rmf
4. Compare Two Subgraphs - Structural Similarity¶
Extract two subgraphs and compare them using structural similarity (Jaccard overlap).
# Select two different entities
sg1_id = action_ids[0]
sg2_id = action_ids[1]
print(f"Entity 1: {sg1_id}")
print(f"Entity 2: {sg2_id}")
# Extract subgraphs
sg1 = extract_subgraph(explorer, sg1_id, max_hops=1)
sg2 = extract_subgraph(explorer, sg2_id, max_hops=1)
print(f"\nSubgraph 1:")
print(f" Nodes by type: {dict((k, len(v)) for k, v in sorted(sg1.nodes.items()))}")
print(f"\nSubgraph 2:")
print(f" Nodes by type: {dict((k, len(v)) for k, v in sorted(sg2.nodes.items()))}")
Entity 1: GV-1.1-001
Entity 2: GV-1.2-001
Subgraph 1:
Nodes by type: {'Action': 1, 'Documentation': 1, 'Risk': 2, 'RiskTaxonomy': 1}
Subgraph 2:
Nodes by type: {'Action': 1, 'Documentation': 1, 'Risk': 3, 'RiskTaxonomy': 1}
4.1 Compute Structural Similarity¶
# Compute structural similarity (Jaccard)
result = compute_similarity(sg1, sg2, method="structural")
print(f"Structural Similarity: {result.score:.3f}\n")
print(f"Breakdown by entity type (Jaccard score):")
breakdown_df = pd.DataFrame(
[(k, v) for k, v in sorted(result.breakdown.items(), key=lambda x: x[1], reverse=True)],
columns=["Entity Type", "Jaccard Score"]
)
print(breakdown_df.to_string(index=False))
print(f"\nInterpretation:")
print(f" - Score close to 1.0: Very similar neighborhoods")
print(f" - Score close to 0.0: Disjoint neighborhoods")
print(f" - Per-type breakdown shows which entity types contribute most to the difference")
Structural Similarity: 0.667
Breakdown by entity type (Jaccard score):
Entity Type Jaccard Score
Documentation 1.000000
RiskTaxonomy 1.000000
Risk 0.666667
Action 0.000000
Interpretation:
- Score close to 1.0: Very similar neighborhoods
- Score close to 0.0: Disjoint neighborhoods
- Per-type breakdown shows which entity types contribute most to the difference
5. Compare Using Semantic Similarity¶
Compare the subgraphs using semantic similarity based on text embeddings.
# Compute semantic similarity
try:
result_semantic = compute_similarity(sg1, sg2, method="semantic")
print(f"Semantic Similarity: {result_semantic.score:.3f}")
print(f"\nInterpretation:")
print(f" Based on cosine distance of text embeddings (nli-mpnet-base-v2)")
print(f" Score close to 1.0: Similar textual meaning")
print(f" Score close to 0.0: Semantically different")
except ImportError as e:
print(f"⚠ Semantic similarity requires txtai: {e}")
result_semantic = None
Loading weights: 0%| | 0/199 [00:00<?, ?it/s]
MPNetModel LOAD REPORT from: sentence-transformers/nli-mpnet-base-v2
Key | Status | |
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED | |
Notes:
- UNEXPECTED: can be ignored when loading from different task/architecture; not ok if you expect identical arch.
Semantic Similarity: 0.974 Interpretation: Based on cosine distance of text embeddings (nli-mpnet-base-v2) Score close to 1.0: Similar textual meaning Score close to 0.0: Semantically different
6. Compare Using Hybrid Similarity¶
Combine structural and semantic similarity with a weighted blend.
# Compute hybrid similarity with different alpha values
alphas = [0.0, 0.25, 0.5, 0.75, 1.0]
results_hybrid = []
print(f"Hybrid Similarity (varying structural weight):\n")
print(f"Alpha | Structural | Semantic | Hybrid")
print(f"------|------------|----------|--------")
for alpha in alphas:
try:
result = compute_similarity(sg1, sg2, method="hybrid", alpha=alpha)
results_hybrid.append(result)
print(f"{alpha:5.2f} | {result.structural_score:10.3f} | {result.semantic_score:8.3f} | {result.score:6.3f}")
except ImportError:
print(f"⚠ Semantic component requires txtai")
break
print(f"\nAlpha interpretation:")
print(f" alpha=1.0: Pure structural (Jaccard)")
print(f" alpha=0.5: Equal weight to structure and semantics")
print(f" alpha=0.0: Pure semantic (embeddings)")
Hybrid Similarity (varying structural weight): Alpha | Structural | Semantic | Hybrid ------|------------|----------|--------
Loading weights: 0%| | 0/199 [00:00<?, ?it/s]
MPNetModel LOAD REPORT from: sentence-transformers/nli-mpnet-base-v2
Key | Status | |
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED | |
Notes:
- UNEXPECTED: can be ignored when loading from different task/architecture; not ok if you expect identical arch.
0.00 | 0.667 | 0.974 | 0.974
Loading weights: 0%| | 0/199 [00:00<?, ?it/s]
MPNetModel LOAD REPORT from: sentence-transformers/nli-mpnet-base-v2
Key | Status | |
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED | |
Notes:
- UNEXPECTED: can be ignored when loading from different task/architecture; not ok if you expect identical arch.
0.25 | 0.667 | 0.974 | 0.897
Loading weights: 0%| | 0/199 [00:00<?, ?it/s]
MPNetModel LOAD REPORT from: sentence-transformers/nli-mpnet-base-v2
Key | Status | |
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED | |
Notes:
- UNEXPECTED: can be ignored when loading from different task/architecture; not ok if you expect identical arch.
0.50 | 0.667 | 0.974 | 0.820
Loading weights: 0%| | 0/199 [00:00<?, ?it/s]
MPNetModel LOAD REPORT from: sentence-transformers/nli-mpnet-base-v2
Key | Status | |
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED | |
Notes:
- UNEXPECTED: can be ignored when loading from different task/architecture; not ok if you expect identical arch.
0.75 | 0.667 | 0.974 | 0.743
Loading weights: 0%| | 0/199 [00:00<?, ?it/s]
MPNetModel LOAD REPORT from: sentence-transformers/nli-mpnet-base-v2
Key | Status | |
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED | |
Notes:
- UNEXPECTED: can be ignored when loading from different task/architecture; not ok if you expect identical arch.
1.00 | 0.667 | 0.974 | 0.667 Alpha interpretation: alpha=1.0: Pure structural (Jaccard) alpha=0.5: Equal weight to structure and semantics alpha=0.0: Pure semantic (embeddings)
7. Use Case: Comparing AI Systems¶
Compare two real-world AI hiring systems with different compliance requirements: one deployed in New York (subject to NYC Local Law 144) and one in the EU (subject to GDPR), loaded this use case data from a sample YAML file.
# Use sample data loaded in the Setup section
# Extract AI system IDs from the loaded ontology
ai_system_ids = []
for item in (nexus._ontology.entries or []):
if type(item).__name__ == "AiSystem" and hasattr(item, "id"):
ai_system_ids.append(item.id)
print(f"Found {len(ai_system_ids)} AI System entities in sample data:")
for aid in ai_system_ids:
print(f" - {aid}")
hiring_systems = [aid for aid in ai_system_ids if "hiring" in aid.lower()]
if hiring_systems:
print(f"\n✓ Hiring use case systems loaded: {hiring_systems}")
Found 2 AI System entities in sample data: - hiring-usecase-ny - hiring-usecase-eu ✓ Hiring use case systems loaded: ['hiring-usecase-ny', 'hiring-usecase-eu']
7.1 Inspect the Hiring Use Case Systems¶
The hiring use case contains two AI systems:
- hiring-usecase-ny: Algorithmic hiring system in New York (NYC Local Law 144)
- hiring-usecase-eu: Algorithmic hiring system in EU (GDPR & EU AI Act)
if len(ai_system_ids) >= 2:
# Use the pre-loaded sample data
sys1_id = ai_system_ids[0] # hiring-usecase-ny
sys2_id = ai_system_ids[1] # hiring-usecase-eu
sys1 = explorer.get_by_id(None, sys1_id)
sys2 = explorer.get_by_id(None, sys2_id)
print(f"System 1: {sys1_id}")
if sys1:
print(f" Name: {getattr(sys1, 'name', 'N/A')}")
print(f" Description: {str(getattr(sys1, 'description', 'N/A'))[:100]}...")
print(f" Deployed in: {getattr(sys1, 'isUsedWithinLocality', 'N/A')}")
print(f"\nSystem 2: {sys2_id}")
if sys2:
print(f" Name: {getattr(sys2, 'name', 'N/A')}")
print(f" Description: {str(getattr(sys2, 'description', 'N/A'))[:100]}...")
print(f" Deployed in: {getattr(sys2, 'isUsedWithinLocality', 'N/A')}")
System 1: hiring-usecase-ny Name: Algorithmic Hiring (New York, US) Description: AI model for algorithmic hiring developed and verified in New York, US. This use case is governed by... Deployed in: ['hiring-usecase-locality-ny-ny-usa'] System 2: hiring-usecase-eu Name: Algorithmic Hiring (EU) Description: AI model for algorithmic hiring deployed in the European Union. This use case is governed by GDPR (A... Deployed in: ['hiring-usecase-locality-dublin-ie-eu']
7.2 Extract and Compare System Neighborhoods¶
if len(ai_system_ids) >= 2:
# Extract subgraphs with different traversal depths
print(f"Extracting neighborhoods for hiring AI systems...\n")
# 1-hop: direct relationships (risks, controls, etc.)
sys1_sg_1hop = extract_subgraph(explorer, sys1_id, max_hops=1)
sys2_sg_1hop = extract_subgraph(explorer, sys2_id, max_hops=1)
print(f"NY System (1-hop neighborhood):")
print(f" Total nodes: {sum(len(v) for v in sys1_sg_1hop.nodes.values())}")
print(f" Risks: {len(sys1_sg_1hop.nodes.get('Risk', set()))}")
print(f" Controls: {len(sys1_sg_1hop.nodes.get('Action', set()))}")
print(f" Localities: {len(sys1_sg_1hop.nodes.get('LocalityOfUse', set()))}")
print(f" Edges: {len(sys1_sg_1hop.edges)}")
print(f"\nEU System (1-hop neighborhood):")
print(f" Total nodes: {sum(len(v) for v in sys2_sg_1hop.nodes.values())}")
print(f" Risks: {len(sys2_sg_1hop.nodes.get('Risk', set()))}")
print(f" Controls: {len(sys2_sg_1hop.nodes.get('Action', set()))}")
print(f" Localities: {len(sys2_sg_1hop.nodes.get('LocalityOfUse', set()))}")
print(f" Edges: {len(sys2_sg_1hop.edges)}")
Extracting neighborhoods for hiring AI systems... NY System (1-hop neighborhood): Total nodes: 14 Risks: 5 Controls: 0 Localities: 1 Edges: 13 EU System (1-hop neighborhood): Total nodes: 14 Risks: 5 Controls: 0 Localities: 1 Edges: 13
7.3 Detailed Risk Profile Comparison¶
Compare how the NY and EU hiring systems differ in their associated risks and how they're governed by different regulations.
if len(ai_system_ids) >= 2:
# Detailed node-by-node comparison
print(f"Hiring Systems Risk Profile Comparison\n")
print(f"{'Metric':<35} {'NY System':>15} {'EU System':>15}")
print(f"{'-'*65}")
all_types = set(sys1_sg_1hop.nodes.keys()) | set(sys2_sg_1hop.nodes.keys())
for entity_type in sorted(all_types):
nodes1 = sys1_sg_1hop.nodes.get(entity_type, set())
nodes2 = sys2_sg_1hop.nodes.get(entity_type, set())
shared = nodes1 & nodes2
if len(nodes1) > 0 or len(nodes2) > 0:
print(f"\n{entity_type}s:")
print(f" {'Total':<33} {len(nodes1):>15} {len(nodes2):>15}")
print(f" {'Shared':<33} {len(shared):>15}")
only_1 = len(nodes1 - nodes2)
only_2 = len(nodes2 - nodes1)
if only_1 > 0 or only_2 > 0:
print(f" {'Unique to NY':<33} {only_1:>15}")
print(f" {'Unique to EU':<33} {only_2:>15}")
print(f"\n\nEdges (Relationships):")
print(f" {'NY System':<33} {len(sys1_sg_1hop.edges):>15}")
print(f" {'EU System':<33} {len(sys2_sg_1hop.edges):>15}")
# Show the specific risks
ny_risks = sys1_sg_1hop.nodes.get('Risk', set())
eu_risks = sys2_sg_1hop.nodes.get('Risk', set())
if ny_risks or eu_risks:
print(f"\n\nRisks in each system:")
print(f" NY risks: {ny_risks}")
print(f" EU risks: {eu_risks}")
Hiring Systems Risk Profile Comparison
Metric NY System EU System
-----------------------------------------------------------------
AiSystems:
Total 1 1
Shared 0
Unique to NY 1
Unique to EU 1
AttributeConditionRules:
Total 1 1
Shared 0
Unique to NY 1
Unique to EU 1
Capabilitys:
Total 1 1
Shared 1
CapabilityGroups:
Total 1 1
Shared 1
LocalityOfUses:
Total 1 1
Shared 0
Unique to NY 1
Unique to EU 1
Purposes:
Total 1 1
Shared 1
Risks:
Total 5 5
Shared 5
RiskTaxonomys:
Total 1 1
Shared 1
Stakeholders:
Total 2 2
Shared 2
Edges (Relationships):
NY System 13
EU System 13
Risks in each system:
NY risks: {'hiring-risk-discriminatory-actions', 'hiring-risk-unexplainable-output', 'hiring-risk-lack-of-transparency', 'hiring-risk-over-under-reliance', 'hiring-risk-unrepresentative-data'}
EU risks: {'hiring-risk-discriminatory-actions', 'hiring-risk-unexplainable-output', 'hiring-risk-lack-of-transparency', 'hiring-risk-over-under-reliance', 'hiring-risk-unrepresentative-data'}
7.4 Similarity Analysis:¶
Compare the two systems across all similarity dimensions. The systems face the same risks but are governed by different regulations (NYC Law 144 vs GDPR).
if len(ai_system_ids) >= 2:
print(f"Hiring Systems Similarity Comparison (All Methods)\n")
print(f"{'Method':<20} {'Score':<10} {'Interpretation':<50}")
print(f"{'-'*80}")
# Structural (Jaccard overlap on node sets)
result_struct = compute_similarity(sys1_sg_1hop, sys2_sg_1hop, method="structural")
print(f"{'Structural':<20} {result_struct.score:<10.3f} Risk/control overlap")
# Semantic (text embeddings)
try:
result_semantic = compute_similarity(sys1_sg_1hop, sys2_sg_1hop, method="semantic")
print(f"{'Semantic':<20} {result_semantic.score:<10.3f} Textual description similarity")
except ImportError:
print(f"{'Semantic':<20} {'N/A':<10} Requires txtai library")
result_semantic = None
# Hybrid (balanced)
try:
result_hybrid = compute_similarity(sys1_sg_1hop, sys2_sg_1hop, method="hybrid", alpha=0.5)
print(f"{'Hybrid (50/50)':<20} {result_hybrid.score:<10.3f} Balanced structural + semantic")
except ImportError:
pass
print(f"\nFindings:")
print(f" - Both systems address the SAME risks (transparency, discrimination, reliance, etc.)")
print(f" - High structural similarity expected since they target the same use case")
print(f" - Differences are in deployment locality and applicable regulations:")
print(f" • NY: NYC Local Law 144 (annual bias audits)")
print(f" • EU: GDPR Article 22 (right to explanation)")
print(f" - Semantic similarity shows textual description alignment")
print(f" - Hybrid score combines both perspectives for overall comparison")
Hiring Systems Similarity Comparison (All Methods) Method Score Interpretation -------------------------------------------------------------------------------- Structural 0.667 Risk/control overlap
Loading weights: 0%| | 0/199 [00:00<?, ?it/s]
MPNetModel LOAD REPORT from: sentence-transformers/nli-mpnet-base-v2
Key | Status | |
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED | |
Notes:
- UNEXPECTED: can be ignored when loading from different task/architecture; not ok if you expect identical arch.
Semantic 0.974 Textual description similarity
Loading weights: 0%| | 0/199 [00:00<?, ?it/s]
MPNetModel LOAD REPORT from: sentence-transformers/nli-mpnet-base-v2
Key | Status | |
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED | |
Notes:
- UNEXPECTED: can be ignored when loading from different task/architecture; not ok if you expect identical arch.
Hybrid (50/50) 0.820 Balanced structural + semantic
Findings:
- Both systems address the SAME risks (transparency, discrimination, reliance, etc.)
- High structural similarity expected since they target the same use case
- Differences are in deployment locality and applicable regulations:
• NY: NYC Local Law 144 (annual bias audits)
• EU: GDPR Article 22 (right to explanation)
- Semantic similarity shows textual description alignment
- Hybrid score combines both perspectives for overall comparison
8. Compare Multiple Pairs¶
Build a similarity matrix comparing multiple entities.
# Take a sample of actions
sample_ids = action_ids[:5]
sample_names = {}
for aid in sample_ids:
entity = explorer.get_by_id("actions", aid)
name = getattr(entity, "name", aid)
sample_names[aid] = name[:30] if isinstance(name, str) else str(name)[:30]
print(f"Comparing {len(sample_ids)} entities")
print(f"\nEntity IDs and names:")
for aid, name in sample_names.items():
print(f" {aid}: {name}")
Comparing 5 entities Entity IDs and names: GV-1.1-001: GV-1.1-001 GV-1.2-001: GV-1.2-001 GV-1.2-002: GV-1.2-002 GV-1.3-001: GV-1.3-001 GV-1.3-002: GV-1.3-002
8.1 Build Similarity Matrix¶
# Extract subgraphs for all samples
subgraphs_sample = {aid: extract_subgraph(explorer, aid, max_hops=1) for aid in sample_ids}
# Build similarity matrix
similarity_matrix = []
for i, id1 in enumerate(sample_ids):
row = []
for j, id2 in enumerate(sample_ids):
if i == j:
score = 1.0 # Diagonal is always 1.0
else:
result = compute_similarity(subgraphs_sample[id1], subgraphs_sample[id2], method="structural")
score = result.score
row.append(score)
similarity_matrix.append(row)
# Display as DataFrame
sim_df = pd.DataFrame(
similarity_matrix,
index=[f"{i}" for i in range(len(sample_ids))],
columns=[f"{i}" for i in range(len(sample_ids))]
)
print("\nStructural Similarity Matrix (Structural Jaccard):")
print(sim_df.round(3).to_string())
print(f"\nRow/Column key:")
for i, aid in enumerate(sample_ids):
print(f" {i}: {sample_names[aid]}")
Structural Similarity Matrix (Structural Jaccard):
0 1 2 3 4
0 1.000 0.667 0.500 0.500 0.500
1 0.667 1.000 0.500 0.536 0.500
2 0.500 0.500 1.000 0.542 0.562
3 0.500 0.536 0.542 1.000 0.583
4 0.500 0.500 0.562 0.583 1.000
Row/Column key:
0: GV-1.1-001
1: GV-1.2-001
2: GV-1.2-002
3: GV-1.3-001
4: GV-1.3-002
9. Use Case: Finding Similar Risks¶
Example: Find which risks are most similar to each other based on their neighborhoods.
# Get some Risk entities (from concepts collection)
risk_ids = []
for item in (nexus.get_all("Risks") or []):
if type(item).__name__ == "Risk" and hasattr(item, "id"):
risk_ids.append(item.id)
print(f"Found {len(risk_ids)} Risk entities")
if len(risk_ids) >= 3:
# Compare risks
sample_risk_ids = risk_ids[:3]
print(f"\nComparing first 3 risks:")
for i, rid in enumerate(sample_risk_ids):
entity = explorer.get_by_id(None, rid)
if entity:
name = getattr(entity, "name", rid)
print(f" {i}: {name}")
# Pairwise comparison
print(f"\nPairwise structural similarities:")
for i in range(len(sample_risk_ids)):
for j in range(i + 1, len(sample_risk_ids)):
sg1 = extract_subgraph(explorer, sample_risk_ids[i], max_hops=1)
sg2 = extract_subgraph(explorer, sample_risk_ids[j], max_hops=1)
result = compute_similarity(sg1, sg2, method="structural")
print(f" Risk {i} vs Risk {j}: {result.score:.3f}")
else:
print("Not enough Risk entities for demonstration")
Found 561 Risk entities Comparing first 3 risks: 0: ASI01:2026 Agent Goal Hijack 1: ASI02:2026 Tool Misuse & Exploitation 2: ASI03:2026 Identity & Privilege Abuse Pairwise structural similarities: Risk 0 vs Risk 1: 0.625 Risk 0 vs Risk 2: 0.700 Risk 1 vs Risk 2: 0.600
10. Advanced: Understanding the Subgraph Structure¶
Let's dig deeper into what makes two subgraphs similar or different.
# Compare nodes across two subgraphs
sg1 = subgraphs_sample[sample_ids[0]]
sg2 = subgraphs_sample[sample_ids[1]]
print(f"Detailed comparison of subgraph 0 vs subgraph 1:")
print(f"\nNode set comparison by type:\n")
all_types = set(sg1.nodes.keys()) | set(sg2.nodes.keys())
for entity_type in sorted(all_types):
nodes1 = sg1.nodes.get(entity_type, set())
nodes2 = sg2.nodes.get(entity_type, set())
shared = nodes1 & nodes2
only_sg1 = nodes1 - nodes2
only_sg2 = nodes2 - nodes1
print(f"{entity_type}:")
print(f" Total in SG1: {len(nodes1)}")
print(f" Total in SG2: {len(nodes2)}")
print(f" Shared: {len(shared)}")
print(f" Only in SG1: {len(only_sg1)}")
print(f" Only in SG2: {len(only_sg2)}")
if shared:
print(f" Shared IDs: {list(shared)[:3]}")
print()
Detailed comparison of subgraph 0 vs subgraph 1: Node set comparison by type: Action: Total in SG1: 1 Total in SG2: 1 Shared: 0 Only in SG1: 1 Only in SG2: 1 Documentation: Total in SG1: 1 Total in SG2: 1 Shared: 1 Only in SG1: 0 Only in SG2: 0 Shared IDs: ['NIST.AI.600-1'] Risk: Total in SG1: 2 Total in SG2: 3 Shared: 2 Only in SG1: 0 Only in SG2: 1 Shared IDs: ['nist-data-privacy', 'nist-intellectual-property'] RiskTaxonomy: Total in SG1: 1 Total in SG2: 1 Shared: 1 Only in SG1: 0 Only in SG2: 0 Shared IDs: ['nist-ai-rmf']
11. Summary¶
Key takeaways:
print("""
=== Subgraph Extraction and Similarity - Summary ===
Functions available:
1. extract_subgraph(explorer, entity_id, max_hops=2)
- Works on any entity type (generic)
- max_hops controls traversal depth
- Returns SubGraph with nodes grouped by type and edges
2. compute_similarity(sg1, sg2, method='structural', alpha=0.5)
- method='structural': Jaccard overlap on node types (fast)
- method='semantic': Cosine similarity on text embeddings (requires txtai)
- method='hybrid': Weighted blend of both (configurable alpha)
Typical workflow:
1. Initialize explorer: PyoxigraphExplorer(nexus._ontology)
2. Extract subgraph: sg = extract_subgraph(explorer, entity_id)
3. Compare: result = compute_similarity(sg1, sg2, method='structural')
4. Analyze: result.score, result.breakdown, result.structural_score
Use cases:
- Compare AI system risk profiles (Section 7)
- Find similar risk entities
- Detect duplicate or near-duplicate entities
- Analyze entity neighborhoods
- Build similarity matrices for clustering
""")
=== Subgraph Extraction and Similarity - Summary ===
Functions available:
1. extract_subgraph(explorer, entity_id, max_hops=2)
- Works on any entity type (generic)
- max_hops controls traversal depth
- Returns SubGraph with nodes grouped by type and edges
2. compute_similarity(sg1, sg2, method='structural', alpha=0.5)
- method='structural': Jaccard overlap on node types (fast)
- method='semantic': Cosine similarity on text embeddings (requires txtai)
- method='hybrid': Weighted blend of both (configurable alpha)
Typical workflow:
1. Initialize explorer: PyoxigraphExplorer(nexus._ontology)
2. Extract subgraph: sg = extract_subgraph(explorer, entity_id)
3. Compare: result = compute_similarity(sg1, sg2, method='structural')
4. Analyze: result.score, result.breakdown, result.structural_score
Use cases:
- Compare AI system risk profiles (Section 7)
- Find similar risk entities
- Detect duplicate or near-duplicate entities
- Analyze entity neighborhoods
- Build similarity matrices for clustering