Loading Data from HuggingFace¶
Datasets exist on Huggingface that might be useful to load into the AI Atlas Nexus.
Custom loader: AutoBenchmarkCard¶
To demonstrate an example of a custom loader, this notebook demonstrates how to use the AutoBenchmarkCardLoader class to load benchmark metadata from the IBM Auto-BenchmarkCard HuggingFace dataset, which is composed of Benchmarkcard metadata and convert it to BenchmarkMetadataCard instances in the AI Atlas Nexus knowledge graph.
Overview¶
The AutoBenchmarkCardLoader is designed to:
- Load benchmark records from the IBM Auto-BenchmarkCard dataset
- Transform HuggingFace dataset records into LinkML-compatible BenchmarkMetadataCard instances
- Export the transformed data to YAML format for use with AI Atlas Nexus
Installation¶
Before using the AutoBenchmarkCardLoader, ensure you have the required dependencies installed:
# Install required packages
# !pip install datasets # For loading HuggingFace datasets
# !pip install ai-atlas-nexus # The main library
print("Dependencies installed. Ready to proceed.")
Dependencies installed. Ready to proceed.
Basic Usage: Load and Save Benchmarks¶
The simplest way to use the AutoBenchmarkCardLoader is to create an instance and call the load_and_save_benchmarks() method:
from ai_atlas_nexus.blocks.hf_data_loader.auto_benchmark_card import AutoBenchmarkCardLoader
# Create a loader instance for the training split
loader = AutoBenchmarkCardLoader(split="train")
# Load all benchmark records and save to YAML
output_path = "auto_benchmark_cards.yaml"
loader.load_and_save_benchmarks(output_path)
print(f"Benchmarks saved to {output_path}")
/Users/ingevejs/Documents/workspace/ingelise/risk-atlas-nexus/v-ai-atlas-nexus/lib/python3.12/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm [2026-05-14 15:21:47:562] - INFO - AIAtlasNexus - Created AIAtlasNexus instance. Base_dir: None [2026-05-14 15:21:49:667] - INFO - AIAtlasNexus - Loaded dataset ibm-research/Auto-BenchmarkCard with 105 records [2026-05-14 15:21:49:668] - INFO - AIAtlasNexus - Loading Auto-BenchmarkCard dataset and saving to auto_benchmark_cards.yaml [2026-05-14 15:21:49:907] - INFO - AIAtlasNexus - Successfully transformed 105 records [2026-05-14 15:21:50:69] - INFO - AIAtlasNexus - Saved 105 records to auto_benchmark_cards.yaml [2026-05-14 15:21:50:70] - INFO - AIAtlasNexus - Load and save complete: 105 records saved to auto_benchmark_cards.yaml
Benchmarks saved to auto_benchmark_cards.yaml
Option 2: Transform Records Individually¶
You can also load and transform records step-by-step:
# Create a loader instance
loader = AutoBenchmarkCardLoader(split="train")
# Load and transform all records
transformed_records = loader.load_and_transform()
print(f"Transformed {len(transformed_records)} records")
# Inspect the first transformed record
if transformed_records:
print("\\nFirst benchmark metadata card:")
import json
print(json.dumps(transformed_records[0], indent=2, default=str))
[2026-05-14 15:21:50:756] - INFO - AIAtlasNexus - Created AIAtlasNexus instance. Base_dir: None [2026-05-14 15:21:51:701] - INFO - AIAtlasNexus - Loaded dataset ibm-research/Auto-BenchmarkCard with 105 records [2026-05-14 15:21:51:931] - INFO - AIAtlasNexus - Successfully transformed 105 records
Transformed 105 records \nFirst benchmark metadata card: "id='auto_benchmark_card_multilingual-wikihow-qa-16k' name='multilingual-wikihow-qa-16k' description='Contains Parquet of a list of instructions and WikiHow articles on different languages.' url=None dateCreated=datetime.date(2026, 5, 14) dateModified=None exact_mappings=None close_mappings=None related_mappings=None narrow_mappings=None broad_mappings=None isCategorizedAs=None describesAiEval=['multilingual-wikihow-qa-16k'] hasDataType=['text'] hasDomains=['question-answering'] hasLanguages=['en', 'ru', 'pt', 'it', 'es', 'fr', 'de', 'nl'] hasSimilarBenchmarks=['Not specified'] hasResources=['https://huggingface.co/datasets/0x22almostEvil/multilingual-wikihow-qa-16k'] hasGoal='Not specified' hasAudience=['Not specified'] hasTasks=['question answering'] hasLimitations=['Not specified'] hasOutOfScopeUses=['Not specified'] hasDataSource=['WikiHow'] hasDataSize='16,822 training examples' hasDataFormat=['Parquet'] hasAnnotation=['Not specified'] hasMethods=['Not specified'] hasMetrics=['rouge'] hasCalculation=['This is the classical NLP Rouge metric based on the RougeScorer library (https://github.com/google-research/google-research/tree/master/rouge). It computes metrics several metrics (rouge1, rouge2, roughL, and rougeLsum) based lexical (word) overlap between the prediction and the ground truth references.'] hasInterpretation=['Not specified'] hasBaselineResults=['Not specified'] hasValidation=['split_random_mix: train[90%], validation[5%], test[5%]'] hasRelatedRisk=['atlas-evasion-attack', 'atlas-incorrect-risk-testing', 'atlas-over-or-under-reliance', 'atlas-membership-inference-attack', 'atlas-confidential-data-in-prompt'] hasDemographicAnalysis=None hasConsiderationPrivacyAndAnonymity=['Not specified'] hasLicense='CC-BY-NC-3.0' hasConsiderationConsentProcedures=['Not specified'] hasConsiderationComplianceWithRegulations=['Not specified'] hasDocumentation=None overview=None type='BenchmarkMetadataCard'"
Working with Different Dataset Splits¶
The dataset may have different splits (e.g., train, validation, test). You can load a specific split:
# Load a specific split
for split in ["train", #"validation", "test"
]: # for this dataset only train exists
try:
loader = AutoBenchmarkCardLoader(split=split)
print(f"Split '{split}': {len(loader.dataset)} records")
except Exception as e:
print(f"Split '{split}' not available: {e}")
[2026-05-14 15:21:52:286] - INFO - AIAtlasNexus - Created AIAtlasNexus instance. Base_dir: None [2026-05-14 15:21:53:334] - INFO - AIAtlasNexus - Loaded dataset ibm-research/Auto-BenchmarkCard with 105 records
Split 'train': 105 records
Saving to YAML Format¶
The transformed records are saved to YAML format, which is compatible with AI Atlas Nexus LinkML models:
# Load, transform, and save in one operation
loader = AutoBenchmarkCardLoader(split="train")
output_file = "auto_benchmark_cards.yaml"
# This convenience method does load_and_transform() + save_to_yaml()
loader.load_and_save(output_file)
# Inspect the generated YAML
with open(output_file, 'r') as f:
import yaml
data = yaml.safe_load(f)
print(f"YAML file contains {len(data.get('benchmarkmetadatacards', []))} benchmarks")
[2026-05-14 15:21:53:815] - INFO - AIAtlasNexus - Created AIAtlasNexus instance. Base_dir: None [2026-05-14 15:21:54:729] - INFO - AIAtlasNexus - Loaded dataset ibm-research/Auto-BenchmarkCard with 105 records [2026-05-14 15:21:54:958] - INFO - AIAtlasNexus - Successfully transformed 105 records [2026-05-14 15:21:55:103] - INFO - AIAtlasNexus - Saved 105 records to auto_benchmark_cards.yaml [2026-05-14 15:21:55:104] - INFO - AIAtlasNexus - Load and save complete: 105 records saved to auto_benchmark_cards.yaml
YAML file contains 105 benchmarks
Filtering and Processing Results¶
After loading, you can filter and process the transformed records:
# Load and transform
loader = AutoBenchmarkCardLoader(split="train")
records = loader.load_and_transform()
# Find benchmarks with specific characteristics
benchmarks_with_risks = [r for r in records if "hasRelatedRisk" in r]
print(f"Benchmarks linked to risks: {len(benchmarks_with_risks)}/{len(records)}")
# Find benchmarks by domain
medical_benchmarks = [r for r in records if "hasDomains" in r and "medicine" in r["hasDomains"]]
print(f"Medical benchmarks: {len(medical_benchmarks)}")
# Get unique data types
all_data_types = set()
for record in records:
if "hasDataType" in record:
data_types = record["hasDataType"] if isinstance(record["hasDataType"], list) else [record["hasDataType"]]
all_data_types.update(data_types)
print(f"\\nUnique data types: {all_data_types}")
[2026-05-14 15:21:55:819] - INFO - AIAtlasNexus - Created AIAtlasNexus instance. Base_dir: None [2026-05-14 15:21:56:735] - INFO - AIAtlasNexus - Loaded dataset ibm-research/Auto-BenchmarkCard with 105 records [2026-05-14 15:21:57:82] - INFO - AIAtlasNexus - Successfully transformed 105 records
Benchmarks linked to risks: 0/105 Medical benchmarks: 0 \nUnique data types: set()
Integrating with AI Atlas Nexus¶
Once you have saved the benchmark metadata cards to YAML, you can load them alongside AI Atlas Nexus for risk analysis and visualization:
from ai_atlas_nexus import AIAtlasNexus
# Initialize the nexus
nexus = AIAtlasNexus(base_dir="<YOUR PATH TO OUTPUT FOLDER>")
# Query benchmarks by name
results = nexus.query("benchmarkmetadatacard", name="multilingual-wikihow-qa-16k") # Example: multilingual-wikihow-qa-16k
print(f"Found {len(results)} matching benchmarks")
# Query benchmarks with specific risks
benchmark_risks = nexus.query(
"benchmarkmetadatacard",
hasRelatedRisk="atlas-impact-on-the-environment" # Example: atlas-impact-on-the-environment
)
print(f"Benchmarks related to model bias: {len(benchmark_risks)}")
[2026-05-14 15:25:50:686] - INFO - AIAtlasNexus - Created AIAtlasNexus instance. Base_dir: /Users/ingevejs/Documents/workspace/ingelise/risk-atlas-nexus/docs/examples/notebooks
Found 1 matching benchmarks Benchmarks related to model bias: 35
Summary¶
The AutoBenchmarkCardLoader provides an example of using a custom loader to:
- Load benchmark data from the IBM Auto-BenchmarkCard HuggingFace dataset
- Transform HuggingFace records into LinkML-compatible BenchmarkMetadataCard instances
You can develop your own custom loaders for HF data sources following this pattern.