Skip to content

Class: BenchmarkMetadataCard

Benchmark metadata cards offer a standardized way to document LLM benchmarks clearly and transparently. Inspired by Model Cards and Datasheets, Benchmark metadata cards help researchers and practitioners understand exactly what benchmarks test, how they relate to real-world risks, and how to interpret their results responsibly. This is an implementation of the design set out in 'BenchmarkCards: Large Language Model and Risk Reporting' (https://doi.org/10.48550/arXiv.2410.12974)

URI: nexus:benchmarkmetadatacard

 classDiagram
    class BenchmarkMetadataCard
    click BenchmarkMetadataCard href "../BenchmarkMetadataCard"
      Entity <|-- BenchmarkMetadataCard
        click Entity href "../Entity"

      BenchmarkMetadataCard : dateCreated

      BenchmarkMetadataCard : dateModified

      BenchmarkMetadataCard : describesAiEval





        BenchmarkMetadataCard --> "*" AiEval : describesAiEval
        click AiEval href "../AiEval"



      BenchmarkMetadataCard : description

      BenchmarkMetadataCard : hasAnnotation

      BenchmarkMetadataCard : hasAudience

      BenchmarkMetadataCard : hasBaselineResults

      BenchmarkMetadataCard : hasCalculation

      BenchmarkMetadataCard : hasConsiderationComplianceWithRegulations

      BenchmarkMetadataCard : hasConsiderationConsentProcedures

      BenchmarkMetadataCard : hasConsiderationPrivacyAndAnonymity

      BenchmarkMetadataCard : hasDataFormat

      BenchmarkMetadataCard : hasDataSize

      BenchmarkMetadataCard : hasDataSource

      BenchmarkMetadataCard : hasDataType

      BenchmarkMetadataCard : hasDemographicAnalysis

      BenchmarkMetadataCard : hasDocumentation





        BenchmarkMetadataCard --> "*" Documentation : hasDocumentation
        click Documentation href "../Documentation"



      BenchmarkMetadataCard : hasDomains

      BenchmarkMetadataCard : hasGoal

      BenchmarkMetadataCard : hasInterpretation

      BenchmarkMetadataCard : hasLanguages

      BenchmarkMetadataCard : hasLicense





        BenchmarkMetadataCard --> "0..1" License : hasLicense
        click License href "../License"



      BenchmarkMetadataCard : hasLimitations

      BenchmarkMetadataCard : hasMethods

      BenchmarkMetadataCard : hasMetrics

      BenchmarkMetadataCard : hasOutOfScopeUses

      BenchmarkMetadataCard : hasRelatedRisk





        BenchmarkMetadataCard --> "*" Risk : hasRelatedRisk
        click Risk href "../Risk"



      BenchmarkMetadataCard : hasResources

      BenchmarkMetadataCard : hasSimilarBenchmarks

      BenchmarkMetadataCard : hasTasks

      BenchmarkMetadataCard : hasValidation

      BenchmarkMetadataCard : id

      BenchmarkMetadataCard : name

      BenchmarkMetadataCard : overview

      BenchmarkMetadataCard : url

Inheritance

  • Entity
    • BenchmarkMetadataCard

Slots

Name Cardinality and Range Description Inheritance
describesAiEval *
AiEval
A relationship where a BenchmarkMetadataCard describes and AI evaluation (ben... direct
hasDataType *
String
The type of data used in the benchmark (e direct
hasDomains *
String
The specific domains or areas where the benchmark is applied (e direct
hasLanguages *
String
The languages included in the dataset used by the benchmark (e direct
hasSimilarBenchmarks *
String
Benchmarks that are closely related in terms of goals or data type direct
hasResources *
String
Links to relevant resources, such as repositories or papers related to the be... direct
hasGoal 0..1
String
The specific goal or primary use case the benchmark is designed for direct
hasAudience 0..1
String
The intended audience, such as researchers, developers, policymakers, etc direct
hasTasks *
String
The tasks or evaluations the benchmark is intended to assess direct
hasLimitations *
String
Limitations in evaluating or addressing risks, such as gaps in demographic co... direct
hasOutOfScopeUses *
String
Use cases where the benchmark is not designed to be applied and could give mi... direct
hasDataSource *
String
The origin or source of the data used in the benchmark (e direct
hasDataSize 0..1
String
The size of the dataset, including the number of data points or examples direct
hasDataFormat 0..1
String
The structure and modality of the data (e direct
hasAnnotation 0..1
String
The process used to annotate or label the dataset, including who or what perf... direct
hasMethods *
String
The evaluation techniques applied within the benchmark direct
hasMetrics *
String
The specific performance metrics used to assess models (e direct
hasCalculation *
String
The way metrics are computed based on model outputs and the benchmark data direct
hasInterpretation *
String
How users should interpret the scores or results from the metrics direct
hasBaselineResults 0..1
String
The results of well-known or widely used models to give context to new perfor... direct
hasValidation *
String
Measures taken to ensure that the benchmark provides valid and reliable evalu... direct
hasRelatedRisk *
Risk or 
RiskConcept or 
Term
A relationship where an entity relates to a risk direct
hasDemographicAnalysis 0..1
String
How the benchmark evaluates performance across different demographic groups (... direct
hasConsiderationPrivacyAndAnonymity 0..1
String
How any personal or sensitive data is handled and whether any anonymization t... direct
hasLicense 0..1
License
Indicates licenses associated with a resource direct
hasConsiderationConsentProcedures 0..1
String
Information on how consent was obtained (if applicable), especially for datas... direct
hasConsiderationComplianceWithRegulations 0..1
String
Compliance with relevant legal or ethical regulations (if applicable) direct
hasDocumentation *
Documentation
Indicates documentation associated with an entity direct
name 0..1
String
The official name of the benchmark direct
overview 0..1
String
A brief description of the benchmark's main goals and scope direct
id 1
String
A unique identifier to this instance of the model element Entity
description 0..1
String
The description of an entity Entity
url 0..1
Uri
An optional URL associated with this instance Entity
dateCreated 0..1
Date
The date on which the entity was created Entity
dateModified 0..1
Date
The date on which the entity was most recently modified Entity

Usages

used by used in type used
Container benchmarkmetadatacards range BenchmarkMetadataCard
AiEval hasBenchmarkMetadata range BenchmarkMetadataCard
BenchmarkMetadataCard describesAiEval domain BenchmarkMetadataCard
Question hasBenchmarkMetadata range BenchmarkMetadataCard
Questionnaire hasBenchmarkMetadata range BenchmarkMetadataCard

Identifier and Mapping Information

Schema Source

  • from schema: https://ibm.github.io/risk-atlas-nexus/ontology/ai-risk-ontology

Mappings

Mapping Type Mapped Value
self nexus:benchmarkmetadatacard
native nexus:BenchmarkMetadataCard

LinkML Source

Direct

name: BenchmarkMetadataCard
description: 'Benchmark metadata cards offer a standardized way to document LLM benchmarks
  clearly and transparently. Inspired by Model Cards and Datasheets, Benchmark metadata
  cards help researchers and practitioners understand exactly what benchmarks test,
  how they relate to real-world risks, and how to interpret their results responsibly.  This
  is an implementation of the design set out in ''BenchmarkCards: Large Language Model
  and Risk Reporting'' (https://doi.org/10.48550/arXiv.2410.12974)'
from_schema: https://ibm.github.io/risk-atlas-nexus/ontology/ai-risk-ontology
is_a: Entity
slots:
- describesAiEval
- hasDataType
- hasDomains
- hasLanguages
- hasSimilarBenchmarks
- hasResources
- hasGoal
- hasAudience
- hasTasks
- hasLimitations
- hasOutOfScopeUses
- hasDataSource
- hasDataSize
- hasDataFormat
- hasAnnotation
- hasMethods
- hasMetrics
- hasCalculation
- hasInterpretation
- hasBaselineResults
- hasValidation
- hasRelatedRisk
- hasDemographicAnalysis
- hasConsiderationPrivacyAndAnonymity
- hasLicense
- hasConsiderationConsentProcedures
- hasConsiderationComplianceWithRegulations
- hasDocumentation
attributes:
  name:
    name: name
    description: The official name of the benchmark.
    from_schema: https://ibm.github.io/risk-atlas-nexus/ontology/ai_eval
    domain_of:
    - Entity
    - BenchmarkMetadataCard
  overview:
    name: overview
    description: A brief description of the benchmark's main goals and scope.
    from_schema: https://ibm.github.io/risk-atlas-nexus/ontology/ai_eval
    rank: 1000
    domain_of:
    - BenchmarkMetadataCard
class_uri: nexus:benchmarkmetadatacard

Induced

name: BenchmarkMetadataCard
description: 'Benchmark metadata cards offer a standardized way to document LLM benchmarks
  clearly and transparently. Inspired by Model Cards and Datasheets, Benchmark metadata
  cards help researchers and practitioners understand exactly what benchmarks test,
  how they relate to real-world risks, and how to interpret their results responsibly.  This
  is an implementation of the design set out in ''BenchmarkCards: Large Language Model
  and Risk Reporting'' (https://doi.org/10.48550/arXiv.2410.12974)'
from_schema: https://ibm.github.io/risk-atlas-nexus/ontology/ai-risk-ontology
is_a: Entity
attributes:
  name:
    name: name
    description: The official name of the benchmark.
    from_schema: https://ibm.github.io/risk-atlas-nexus/ontology/ai_eval
    alias: name
    owner: BenchmarkMetadataCard
    domain_of:
    - Entity
    - BenchmarkMetadataCard
    range: string
  overview:
    name: overview
    description: A brief description of the benchmark's main goals and scope.
    from_schema: https://ibm.github.io/risk-atlas-nexus/ontology/ai_eval
    rank: 1000
    alias: overview
    owner: BenchmarkMetadataCard
    domain_of:
    - BenchmarkMetadataCard
    range: string
  describesAiEval:
    name: describesAiEval
    description: A relationship where a BenchmarkMetadataCard describes and AI evaluation
      (benchmark).
    from_schema: https://ibm.github.io/risk-atlas-nexus/ontology/ai-risk-ontology
    rank: 1000
    domain: BenchmarkMetadataCard
    alias: describesAiEval
    owner: BenchmarkMetadataCard
    domain_of:
    - BenchmarkMetadataCard
    inverse: hasBenchmarkMetadata
    range: AiEval
    multivalued: true
    inlined: false
  hasDataType:
    name: hasDataType
    description: The type of data used in the benchmark (e.g., text, images, or multi-modal)
    from_schema: https://ibm.github.io/risk-atlas-nexus/ontology/ai-risk-ontology
    rank: 1000
    alias: hasDataType
    owner: BenchmarkMetadataCard
    domain_of:
    - BenchmarkMetadataCard
    range: string
    multivalued: true
  hasDomains:
    name: hasDomains
    description: The specific domains or areas where the benchmark is applied (e.g.,
      natural language processing,computer vision).
    from_schema: https://ibm.github.io/risk-atlas-nexus/ontology/ai-risk-ontology
    rank: 1000
    alias: hasDomains
    owner: BenchmarkMetadataCard
    domain_of:
    - BenchmarkMetadataCard
    range: string
    multivalued: true
  hasLanguages:
    name: hasLanguages
    description: The languages included in the dataset used by the benchmark (e.g.,
      English, multilingual).
    from_schema: https://ibm.github.io/risk-atlas-nexus/ontology/ai-risk-ontology
    rank: 1000
    alias: hasLanguages
    owner: BenchmarkMetadataCard
    domain_of:
    - BenchmarkMetadataCard
    range: string
    multivalued: true
  hasSimilarBenchmarks:
    name: hasSimilarBenchmarks
    description: Benchmarks that are closely related in terms of goals or data type.
    from_schema: https://ibm.github.io/risk-atlas-nexus/ontology/ai-risk-ontology
    rank: 1000
    alias: hasSimilarBenchmarks
    owner: BenchmarkMetadataCard
    domain_of:
    - BenchmarkMetadataCard
    range: string
    multivalued: true
  hasResources:
    name: hasResources
    description: Links to relevant resources, such as repositories or papers related
      to the benchmark.
    from_schema: https://ibm.github.io/risk-atlas-nexus/ontology/ai-risk-ontology
    rank: 1000
    alias: hasResources
    owner: BenchmarkMetadataCard
    domain_of:
    - BenchmarkMetadataCard
    range: string
    multivalued: true
  hasGoal:
    name: hasGoal
    description: The specific goal or primary use case the benchmark is designed for.
    from_schema: https://ibm.github.io/risk-atlas-nexus/ontology/ai-risk-ontology
    rank: 1000
    alias: hasGoal
    owner: BenchmarkMetadataCard
    domain_of:
    - BenchmarkMetadataCard
    range: string
  hasAudience:
    name: hasAudience
    description: The intended audience, such as researchers, developers, policymakers,
      etc.
    from_schema: https://ibm.github.io/risk-atlas-nexus/ontology/ai-risk-ontology
    rank: 1000
    alias: hasAudience
    owner: BenchmarkMetadataCard
    domain_of:
    - BenchmarkMetadataCard
    range: string
  hasTasks:
    name: hasTasks
    description: The tasks or evaluations the benchmark is intended to assess.
    from_schema: https://ibm.github.io/risk-atlas-nexus/ontology/ai-risk-ontology
    rank: 1000
    alias: hasTasks
    owner: BenchmarkMetadataCard
    domain_of:
    - AiEval
    - BenchmarkMetadataCard
    range: string
    multivalued: true
    inlined: false
  hasLimitations:
    name: hasLimitations
    description: Limitations in evaluating or addressing risks, such as gaps in demographic
      coverage or specific domains.
    from_schema: https://ibm.github.io/risk-atlas-nexus/ontology/ai-risk-ontology
    rank: 1000
    alias: hasLimitations
    owner: BenchmarkMetadataCard
    domain_of:
    - BenchmarkMetadataCard
    range: string
    multivalued: true
  hasOutOfScopeUses:
    name: hasOutOfScopeUses
    description: Use cases where the benchmark is not designed to be applied and could
      give misleading results.
    from_schema: https://ibm.github.io/risk-atlas-nexus/ontology/ai-risk-ontology
    rank: 1000
    alias: hasOutOfScopeUses
    owner: BenchmarkMetadataCard
    domain_of:
    - BenchmarkMetadataCard
    range: string
    multivalued: true
  hasDataSource:
    name: hasDataSource
    description: The origin or source of the data used in the benchmark (e.g., curated
      datasets, user submissions).
    from_schema: https://ibm.github.io/risk-atlas-nexus/ontology/ai-risk-ontology
    rank: 1000
    alias: hasDataSource
    owner: BenchmarkMetadataCard
    domain_of:
    - BenchmarkMetadataCard
    range: string
    multivalued: true
  hasDataSize:
    name: hasDataSize
    description: The size of the dataset, including the number of data points or examples.
    from_schema: https://ibm.github.io/risk-atlas-nexus/ontology/ai-risk-ontology
    rank: 1000
    alias: hasDataSize
    owner: BenchmarkMetadataCard
    domain_of:
    - BenchmarkMetadataCard
    range: string
  hasDataFormat:
    name: hasDataFormat
    description: The structure and modality of the data (e.g., sentence pairs, question-answer
      format, tabular data).
    from_schema: https://ibm.github.io/risk-atlas-nexus/ontology/ai-risk-ontology
    rank: 1000
    alias: hasDataFormat
    owner: BenchmarkMetadataCard
    domain_of:
    - BenchmarkMetadataCard
    range: string
  hasAnnotation:
    name: hasAnnotation
    description: The process used to annotate or label the dataset, including who
      or what performed the annotations (e.g., human annotators, automated processes).
    from_schema: https://ibm.github.io/risk-atlas-nexus/ontology/ai-risk-ontology
    rank: 1000
    alias: hasAnnotation
    owner: BenchmarkMetadataCard
    domain_of:
    - BenchmarkMetadataCard
    range: string
  hasMethods:
    name: hasMethods
    description: The evaluation techniques applied within the benchmark.
    from_schema: https://ibm.github.io/risk-atlas-nexus/ontology/ai-risk-ontology
    rank: 1000
    alias: hasMethods
    owner: BenchmarkMetadataCard
    domain_of:
    - BenchmarkMetadataCard
    range: string
    multivalued: true
  hasMetrics:
    name: hasMetrics
    description: The specific performance metrics used to assess models (e.g., accuracy,
      F1 score, precision, recall).
    from_schema: https://ibm.github.io/risk-atlas-nexus/ontology/ai-risk-ontology
    rank: 1000
    alias: hasMetrics
    owner: BenchmarkMetadataCard
    domain_of:
    - BenchmarkMetadataCard
    range: string
    multivalued: true
  hasCalculation:
    name: hasCalculation
    description: The way metrics are computed based on model outputs and the benchmark
      data.
    from_schema: https://ibm.github.io/risk-atlas-nexus/ontology/ai-risk-ontology
    rank: 1000
    alias: hasCalculation
    owner: BenchmarkMetadataCard
    domain_of:
    - BenchmarkMetadataCard
    range: string
    multivalued: true
  hasInterpretation:
    name: hasInterpretation
    description: How users should interpret the scores or results from the metrics.
    from_schema: https://ibm.github.io/risk-atlas-nexus/ontology/ai-risk-ontology
    rank: 1000
    alias: hasInterpretation
    owner: BenchmarkMetadataCard
    domain_of:
    - BenchmarkMetadataCard
    range: string
    multivalued: true
  hasBaselineResults:
    name: hasBaselineResults
    description: The results of well-known or widely used models to give context to
      new performance scores.
    from_schema: https://ibm.github.io/risk-atlas-nexus/ontology/ai-risk-ontology
    rank: 1000
    alias: hasBaselineResults
    owner: BenchmarkMetadataCard
    domain_of:
    - BenchmarkMetadataCard
    range: string
  hasValidation:
    name: hasValidation
    description: Measures taken to ensure that the benchmark provides valid and reliable
      evaluations.
    from_schema: https://ibm.github.io/risk-atlas-nexus/ontology/ai-risk-ontology
    rank: 1000
    alias: hasValidation
    owner: BenchmarkMetadataCard
    domain_of:
    - BenchmarkMetadataCard
    range: string
    multivalued: true
  hasRelatedRisk:
    name: hasRelatedRisk
    description: A relationship where an entity relates to a risk
    from_schema: https://ibm.github.io/risk-atlas-nexus/ontology/ai-risk-ontology
    rank: 1000
    domain: Any
    alias: hasRelatedRisk
    owner: BenchmarkMetadataCard
    domain_of:
    - Term
    - Action
    - AiEval
    - BenchmarkMetadataCard
    - LLMIntrinsic
    range: Risk
    multivalued: true
    inlined: false
    any_of:
    - range: RiskConcept
    - range: Term
  hasDemographicAnalysis:
    name: hasDemographicAnalysis
    description: How the benchmark evaluates performance across different demographic
      groups (e.g., gender, race).
    from_schema: https://ibm.github.io/risk-atlas-nexus/ontology/ai-risk-ontology
    rank: 1000
    alias: hasDemographicAnalysis
    owner: BenchmarkMetadataCard
    domain_of:
    - BenchmarkMetadataCard
    range: string
  hasConsiderationPrivacyAndAnonymity:
    name: hasConsiderationPrivacyAndAnonymity
    description: How any personal or sensitive data is handled and whether any anonymization
      techniques are applied.
    from_schema: https://ibm.github.io/risk-atlas-nexus/ontology/ai-risk-ontology
    rank: 1000
    alias: hasConsiderationPrivacyAndAnonymity
    owner: BenchmarkMetadataCard
    domain_of:
    - BenchmarkMetadataCard
    range: string
  hasLicense:
    name: hasLicense
    description: Indicates licenses associated with a resource
    from_schema: https://ibm.github.io/risk-atlas-nexus/ontology/ai-risk-ontology
    rank: 1000
    slot_uri: airo:hasLicense
    alias: hasLicense
    owner: BenchmarkMetadataCard
    domain_of:
    - Dataset
    - Documentation
    - Vocabulary
    - RiskTaxonomy
    - BaseAi
    - AiEval
    - BenchmarkMetadataCard
    range: License
  hasConsiderationConsentProcedures:
    name: hasConsiderationConsentProcedures
    description: Information on how consent was obtained (if applicable), especially
      for datasets involving personal data.
    from_schema: https://ibm.github.io/risk-atlas-nexus/ontology/ai-risk-ontology
    rank: 1000
    alias: hasConsiderationConsentProcedures
    owner: BenchmarkMetadataCard
    domain_of:
    - BenchmarkMetadataCard
    range: string
  hasConsiderationComplianceWithRegulations:
    name: hasConsiderationComplianceWithRegulations
    description: Compliance with relevant legal or ethical regulations (if applicable).
    from_schema: https://ibm.github.io/risk-atlas-nexus/ontology/ai-risk-ontology
    rank: 1000
    alias: hasConsiderationComplianceWithRegulations
    owner: BenchmarkMetadataCard
    domain_of:
    - BenchmarkMetadataCard
    range: string
  hasDocumentation:
    name: hasDocumentation
    description: Indicates documentation associated with an entity.
    from_schema: https://ibm.github.io/risk-atlas-nexus/ontology/ai-risk-ontology
    rank: 1000
    slot_uri: airo:hasDocumentation
    alias: hasDocumentation
    owner: BenchmarkMetadataCard
    domain_of:
    - Dataset
    - Vocabulary
    - Term
    - RiskTaxonomy
    - Action
    - BaseAi
    - LargeLanguageModelFamily
    - AiEval
    - BenchmarkMetadataCard
    - LLMIntrinsic
    range: Documentation
    multivalued: true
    inlined: false
  id:
    name: id
    description: A unique identifier to this instance of the model element. Example
      identifiers include UUID, URI, URN, etc.
    from_schema: https://ibm.github.io/risk-atlas-nexus/ontology/ai-risk-ontology
    rank: 1000
    slot_uri: schema:identifier
    identifier: true
    alias: id
    owner: BenchmarkMetadataCard
    domain_of:
    - Entity
    range: string
    required: true
  description:
    name: description
    description: The description of an entity
    from_schema: https://ibm.github.io/risk-atlas-nexus/ontology/ai-risk-ontology
    rank: 1000
    slot_uri: schema:description
    alias: description
    owner: BenchmarkMetadataCard
    domain_of:
    - Entity
    range: string
  url:
    name: url
    description: An optional URL associated with this instance.
    from_schema: https://ibm.github.io/risk-atlas-nexus/ontology/ai-risk-ontology
    rank: 1000
    slot_uri: schema:url
    alias: url
    owner: BenchmarkMetadataCard
    domain_of:
    - Entity
    range: uri
  dateCreated:
    name: dateCreated
    description: The date on which the entity was created.
    from_schema: https://ibm.github.io/risk-atlas-nexus/ontology/ai-risk-ontology
    rank: 1000
    slot_uri: schema:dateCreated
    alias: dateCreated
    owner: BenchmarkMetadataCard
    domain_of:
    - Entity
    range: date
    required: false
  dateModified:
    name: dateModified
    description: The date on which the entity was most recently modified.
    from_schema: https://ibm.github.io/risk-atlas-nexus/ontology/ai-risk-ontology
    rank: 1000
    slot_uri: schema:dateModified
    alias: dateModified
    owner: BenchmarkMetadataCard
    domain_of:
    - Entity
    range: date
    required: false
class_uri: nexus:benchmarkmetadatacard