Loading Data from HuggingFace¶

Datasets exist on Huggingface that might be useful to load into the AI Atlas Nexus.

Custom loader: AutoBenchmarkCard¶

To demonstrate an example of a custom loader, this notebook demonstrates how to use the AutoBenchmarkCardLoader class to load benchmark metadata from the IBM Auto-BenchmarkCard HuggingFace dataset, which is composed of Benchmarkcard metadata and convert it to BenchmarkMetadataCard instances in the AI Atlas Nexus knowledge graph.

Overview¶

The AutoBenchmarkCardLoader is designed to:

Load benchmark records from the IBM Auto-BenchmarkCard dataset
Transform HuggingFace dataset records into LinkML-compatible BenchmarkMetadataCard instances
Export the transformed data to YAML format for use with AI Atlas Nexus

Installation¶

Before using the AutoBenchmarkCardLoader, ensure you have the required dependencies installed:

In [1]:

Copied!

# Install required packages
# !pip install datasets  # For loading HuggingFace datasets
# !pip install ai-atlas-nexus  # The main library

print("Dependencies installed. Ready to proceed.")
# Install required packages
# !pip install datasets  # For loading HuggingFace datasets
# !pip install ai-atlas-nexus  # The main library

print("Dependencies installed. Ready to proceed.")

Dependencies installed. Ready to proceed.

Basic Usage: Load and Save Benchmarks¶

The simplest way to use the AutoBenchmarkCardLoader is to create an instance and call the load_and_save_benchmarks() method:

In [2]:

Copied!

from ai_atlas_nexus.blocks.hf_data_loader.auto_benchmark_card import AutoBenchmarkCardLoader

# Create a loader instance for the training split
loader = AutoBenchmarkCardLoader(split="train")

# Load all benchmark records and save to YAML
output_path = "auto_benchmark_cards.yaml"
loader.load_and_save_benchmarks(output_path)

print(f"Benchmarks saved to {output_path}")
from ai_atlas_nexus.blocks.hf_data_loader.auto_benchmark_card import AutoBenchmarkCardLoader

# Create a loader instance for the training split
loader = AutoBenchmarkCardLoader(split="train")

# Load all benchmark records and save to YAML
output_path = "auto_benchmark_cards.yaml"
loader.load_and_save_benchmarks(output_path)

print(f"Benchmarks saved to {output_path}")

/Users/ingevejs/Documents/workspace/ingelise/risk-atlas-nexus/v-ai-atlas-nexus/lib/python3.12/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
[2026-05-14 15:21:47:562] - INFO - AIAtlasNexus - Created AIAtlasNexus instance. Base_dir: None
[2026-05-14 15:21:49:667] - INFO - AIAtlasNexus - Loaded dataset ibm-research/Auto-BenchmarkCard with 105 records
[2026-05-14 15:21:49:668] - INFO - AIAtlasNexus - Loading Auto-BenchmarkCard dataset and saving to auto_benchmark_cards.yaml
[2026-05-14 15:21:49:907] - INFO - AIAtlasNexus - Successfully transformed 105 records
[2026-05-14 15:21:50:69] - INFO - AIAtlasNexus - Saved 105 records to auto_benchmark_cards.yaml
[2026-05-14 15:21:50:70] - INFO - AIAtlasNexus - Load and save complete: 105 records saved to auto_benchmark_cards.yaml

Benchmarks saved to auto_benchmark_cards.yaml

Option 2: Transform Records Individually¶

You can also load and transform records step-by-step:

In [3]:

Copied!





# Create a loader instance
loader = AutoBenchmarkCardLoader(split="train")

# Load and transform all records
transformed_records = loader.load_and_transform()

print(f"Transformed {len(transformed_records)} records")

# Inspect the first transformed record
if transformed_records:
    print("\\nFirst benchmark metadata card:")
    import json
    print(json.dumps(transformed_records[0], indent=2, default=str))
# Create a loader instance
loader = AutoBenchmarkCardLoader(split="train")

# Load and transform all records
transformed_records = loader.load_and_transform()

print(f"Transformed {len(transformed_records)} records")

# Inspect the first transformed record
if transformed_records:
    print("\\nFirst benchmark metadata card:")
    import json
    print(json.dumps(transformed_records[0], indent=2, default=str))

[2026-05-14 15:21:50:756] - INFO - AIAtlasNexus - Created AIAtlasNexus instance. Base_dir: None
[2026-05-14 15:21:51:701] - INFO - AIAtlasNexus - Loaded dataset ibm-research/Auto-BenchmarkCard with 105 records
[2026-05-14 15:21:51:931] - INFO - AIAtlasNexus - Successfully transformed 105 records

Transformed 105 records
\nFirst benchmark metadata card:
"id='auto_benchmark_card_multilingual-wikihow-qa-16k' name='multilingual-wikihow-qa-16k' description='Contains Parquet of a list of instructions and WikiHow articles on different languages.' url=None dateCreated=datetime.date(2026, 5, 14) dateModified=None exact_mappings=None close_mappings=None related_mappings=None narrow_mappings=None broad_mappings=None isCategorizedAs=None describesAiEval=['multilingual-wikihow-qa-16k'] hasDataType=['text'] hasDomains=['question-answering'] hasLanguages=['en', 'ru', 'pt', 'it', 'es', 'fr', 'de', 'nl'] hasSimilarBenchmarks=['Not specified'] hasResources=['https://huggingface.co/datasets/0x22almostEvil/multilingual-wikihow-qa-16k'] hasGoal='Not specified' hasAudience=['Not specified'] hasTasks=['question answering'] hasLimitations=['Not specified'] hasOutOfScopeUses=['Not specified'] hasDataSource=['WikiHow'] hasDataSize='16,822 training examples' hasDataFormat=['Parquet'] hasAnnotation=['Not specified'] hasMethods=['Not specified'] hasMetrics=['rouge'] hasCalculation=['This is the classical NLP Rouge metric based on the RougeScorer library (https://github.com/google-research/google-research/tree/master/rouge). It computes metrics several metrics (rouge1, rouge2, roughL, and rougeLsum) based lexical (word) overlap between the prediction and the ground truth references.'] hasInterpretation=['Not specified'] hasBaselineResults=['Not specified'] hasValidation=['split_random_mix: train[90%], validation[5%], test[5%]'] hasRelatedRisk=['atlas-evasion-attack', 'atlas-incorrect-risk-testing', 'atlas-over-or-under-reliance', 'atlas-membership-inference-attack', 'atlas-confidential-data-in-prompt'] hasDemographicAnalysis=None hasConsiderationPrivacyAndAnonymity=['Not specified'] hasLicense='CC-BY-NC-3.0' hasConsiderationConsentProcedures=['Not specified'] hasConsiderationComplianceWithRegulations=['Not specified'] hasDocumentation=None overview=None type='BenchmarkMetadataCard'"

Working with Different Dataset Splits¶

The dataset may have different splits (e.g., train, validation, test). You can load a specific split:

In [4]:

Copied!





# Load a specific split
for split in ["train", #"validation", "test"
              ]: # for this dataset only train exists
    try:
        loader = AutoBenchmarkCardLoader(split=split)
        print(f"Split '{split}': {len(loader.dataset)} records")
    except Exception as e:
        print(f"Split '{split}' not available: {e}")
# Load a specific split
for split in ["train", #"validation", "test"
              ]: # for this dataset only train exists
    try:
        loader = AutoBenchmarkCardLoader(split=split)
        print(f"Split '{split}': {len(loader.dataset)} records")
    except Exception as e:
        print(f"Split '{split}' not available: {e}")

[2026-05-14 15:21:52:286] - INFO - AIAtlasNexus - Created AIAtlasNexus instance. Base_dir: None
[2026-05-14 15:21:53:334] - INFO - AIAtlasNexus - Loaded dataset ibm-research/Auto-BenchmarkCard with 105 records

Split 'train': 105 records

Saving to YAML Format¶

The transformed records are saved to YAML format, which is compatible with AI Atlas Nexus LinkML models:

In [5]:

Copied!





# Load, transform, and save in one operation
loader = AutoBenchmarkCardLoader(split="train")
output_file = "auto_benchmark_cards.yaml"

# This convenience method does load_and_transform() + save_to_yaml()
loader.load_and_save(output_file)

# Inspect the generated YAML
with open(output_file, 'r') as f:
    import yaml
    data = yaml.safe_load(f)
    print(f"YAML file contains {len(data.get('benchmarkmetadatacards', []))} benchmarks")
# Load, transform, and save in one operation
loader = AutoBenchmarkCardLoader(split="train")
output_file = "auto_benchmark_cards.yaml"

# This convenience method does load_and_transform() + save_to_yaml()
loader.load_and_save(output_file)

# Inspect the generated YAML
with open(output_file, 'r') as f:
    import yaml
    data = yaml.safe_load(f)
    print(f"YAML file contains {len(data.get('benchmarkmetadatacards', []))} benchmarks")

[2026-05-14 15:21:53:815] - INFO - AIAtlasNexus - Created AIAtlasNexus instance. Base_dir: None
[2026-05-14 15:21:54:729] - INFO - AIAtlasNexus - Loaded dataset ibm-research/Auto-BenchmarkCard with 105 records
[2026-05-14 15:21:54:958] - INFO - AIAtlasNexus - Successfully transformed 105 records
[2026-05-14 15:21:55:103] - INFO - AIAtlasNexus - Saved 105 records to auto_benchmark_cards.yaml
[2026-05-14 15:21:55:104] - INFO - AIAtlasNexus - Load and save complete: 105 records saved to auto_benchmark_cards.yaml

YAML file contains 105 benchmarks

Filtering and Processing Results¶

After loading, you can filter and process the transformed records:

In [6]:

Copied!





# Load and transform
loader = AutoBenchmarkCardLoader(split="train")
records = loader.load_and_transform()

# Find benchmarks with specific characteristics
benchmarks_with_risks = [r for r in records if "hasRelatedRisk" in r]
print(f"Benchmarks linked to risks: {len(benchmarks_with_risks)}/{len(records)}")

# Find benchmarks by domain
medical_benchmarks = [r for r in records  if "hasDomains" in r and "medicine" in r["hasDomains"]]
print(f"Medical benchmarks: {len(medical_benchmarks)}")

# Get unique data types
all_data_types = set()
for record in records:
    if "hasDataType" in record:
        data_types = record["hasDataType"] if isinstance(record["hasDataType"], list) else [record["hasDataType"]]
        all_data_types.update(data_types)

print(f"\\nUnique data types: {all_data_types}")
# Load and transform
loader = AutoBenchmarkCardLoader(split="train")
records = loader.load_and_transform()

# Find benchmarks with specific characteristics
benchmarks_with_risks = [r for r in records if "hasRelatedRisk" in r]
print(f"Benchmarks linked to risks: {len(benchmarks_with_risks)}/{len(records)}")

# Find benchmarks by domain
medical_benchmarks = [r for r in records  if "hasDomains" in r and "medicine" in r["hasDomains"]]
print(f"Medical benchmarks: {len(medical_benchmarks)}")

# Get unique data types
all_data_types = set()
for record in records:
    if "hasDataType" in record:
        data_types = record["hasDataType"] if isinstance(record["hasDataType"], list) else [record["hasDataType"]]
        all_data_types.update(data_types)

print(f"\\nUnique data types: {all_data_types}")

[2026-05-14 15:21:55:819] - INFO - AIAtlasNexus - Created AIAtlasNexus instance. Base_dir: None
[2026-05-14 15:21:56:735] - INFO - AIAtlasNexus - Loaded dataset ibm-research/Auto-BenchmarkCard with 105 records
[2026-05-14 15:21:57:82] - INFO - AIAtlasNexus - Successfully transformed 105 records

Benchmarks linked to risks: 0/105
Medical benchmarks: 0
\nUnique data types: set()

Integrating with AI Atlas Nexus¶

Once you have saved the benchmark metadata cards to YAML, you can load them alongside AI Atlas Nexus for risk analysis and visualization:

In [ ]:

Copied!





from ai_atlas_nexus import AIAtlasNexus

# Initialize the nexus
nexus = AIAtlasNexus(base_dir="<YOUR PATH TO OUTPUT FOLDER>")

# Query benchmarks by name
results = nexus.query("benchmarkmetadatacard", name="multilingual-wikihow-qa-16k")  # Example: multilingual-wikihow-qa-16k
print(f"Found {len(results)} matching benchmarks")

# Query benchmarks with specific risks
benchmark_risks = nexus.query(
    "benchmarkmetadatacard", 
    hasRelatedRisk="atlas-impact-on-the-environment"  # Example: atlas-impact-on-the-environment
)
print(f"Benchmarks related to model bias: {len(benchmark_risks)}")
from ai_atlas_nexus import AIAtlasNexus

# Initialize the nexus
nexus = AIAtlasNexus(base_dir="")

# Query benchmarks by name
results = nexus.query("benchmarkmetadatacard", name="multilingual-wikihow-qa-16k")  # Example: multilingual-wikihow-qa-16k
print(f"Found {len(results)} matching benchmarks")

# Query benchmarks with specific risks
benchmark_risks = nexus.query(
    "benchmarkmetadatacard", 
    hasRelatedRisk="atlas-impact-on-the-environment"  # Example: atlas-impact-on-the-environment
)
print(f"Benchmarks related to model bias: {len(benchmark_risks)}")

[2026-05-14 15:25:50:686] - INFO - AIAtlasNexus - Created AIAtlasNexus instance. Base_dir: /Users/ingevejs/Documents/workspace/ingelise/risk-atlas-nexus/docs/examples/notebooks

Found 1 matching benchmarks
Benchmarks related to model bias: 35

Summary¶

The AutoBenchmarkCardLoader provides an example of using a custom loader to:

Load benchmark data from the IBM Auto-BenchmarkCard HuggingFace dataset
Transform HuggingFace records into LinkML-compatible BenchmarkMetadataCard instances

You can develop your own custom loaders for HF data sources following this pattern.