Evaluation Result¶

pydantic model ibm_watsonx_gov.entities.evaluation_result.AgentMetricResult¶

Bases: BaseMetricResult

This is the data model for metric results in the agentic app. It stores evaluation results for conversations, messages and nodes.

Show JSON schema

{
   "title": "AgentMetricResult",
   "description": "This is the data model for metric results in the agentic app.\nIt stores evaluation results for conversations, messages and nodes.",
   "type": "object",
   "properties": {
      "name": {
         "description": "The name of the metric.",
         "examples": [
            "answer_relevance",
            "context_relevance"
         ],
         "title": "Metric Name",
         "type": "string"
      },
      "display_name": {
         "description": "The display name of the metric.",
         "examples": [
            "Answer Relevance",
            "Context Relevance"
         ],
         "title": "Metric display name",
         "type": "string"
      },
      "value_type": {
         "default": "numeric",
         "description": "The type of the metric value. Indicates whether the metric value is numeric or categorical.",
         "examples": [
            "numeric",
            "categorical"
         ],
         "title": "Metric value type",
         "type": "string"
      },
      "method": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The method used to compute this metric result.",
         "examples": [
            "token_recall"
         ],
         "title": "Method"
      },
      "provider": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The provider used to compute this metric result.",
         "title": "Provider"
      },
      "value": {
         "anyOf": [
            {
               "type": "number"
            },
            {
               "type": "string"
            },
            {
               "type": "boolean"
            },
            {
               "additionalProperties": {
                  "type": "integer"
               },
               "type": "object"
            },
            {
               "type": "null"
            }
         ],
         "description": "The metric value.",
         "title": "Value"
      },
      "label": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The string equivalent of the metric value. This is used for metrics with categorical value type.",
         "title": "Label"
      },
      "errors": {
         "anyOf": [
            {
               "items": {
                  "$ref": "#/$defs/Error"
               },
               "type": "array"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The list of error messages",
         "title": "Errors"
      },
      "additional_info": {
         "anyOf": [
            {
               "type": "object"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The additional information about the metric result.",
         "title": "Additional Info"
      },
      "explanation": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The explanation about the metric result.",
         "title": "Explanation"
      },
      "group": {
         "anyOf": [
            {
               "$ref": "#/$defs/MetricGroup"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The metric group",
         "title": "Group"
      },
      "thresholds": {
         "default": [],
         "description": "The metric thresholds",
         "items": {
            "$ref": "#/$defs/MetricThreshold"
         },
         "title": "Thresholds",
         "type": "array"
      },
      "id": {
         "description": "The unique identifier for the metric result record. UUID.",
         "title": "Id",
         "type": "string"
      },
      "ts": {
         "description": "The timestamp when the metric was recorded.",
         "format": "date-time",
         "title": "Ts",
         "type": "string"
      },
      "applies_to": {
         "description": "The type of component the metric result applies to.",
         "examples": [
            "conversation",
            "message",
            "node"
         ],
         "title": "Applies To",
         "type": "string"
      },
      "message_id": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "description": "The ID of the message being evaluated.",
         "title": "Message Id"
      },
      "message_ts": {
         "anyOf": [
            {
               "format": "date-time",
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The timestamp of the message being evaluated.",
         "title": "Message Ts"
      },
      "conversation_id": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The ID of the conversation containing the message.",
         "title": "Conversation Id"
      },
      "node_name": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The name of the node being evaluated.",
         "title": "Node Name"
      },
      "execution_count": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The execution count of the node in a message.",
         "title": "Execution count"
      },
      "execution_order": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The execution order number in the sequence of nodes executed in a message.",
         "title": "Execution order"
      },
      "is_violated": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Indicates whether the metric threshold is violated or not. For numeric metric, its set to 1 if the metric value violates the defined threshold lower or upper limit and 0 otherwise. For categorical metric, its set to 1 if the metric value belongs to unfavourable category and 0 otherwise.",
         "title": "Is Violated"
      }
   },
   "$defs": {
      "Error": {
         "properties": {
            "code": {
               "description": "The error code",
               "title": "Code",
               "type": "string"
            },
            "message_en": {
               "description": "The error message in English.",
               "title": "Message En",
               "type": "string"
            },
            "parameters": {
               "default": [],
               "description": "The list of parameters to construct the message in a different locale.",
               "items": {},
               "title": "Parameters",
               "type": "array"
            }
         },
         "required": [
            "code",
            "message_en"
         ],
         "title": "Error",
         "type": "object"
      },
      "MetricGroup": {
         "enum": [
            "retrieval_quality",
            "answer_quality",
            "content_safety",
            "performance",
            "usage",
            "message_completion",
            "tool_call_quality",
            "readability",
            "custom"
         ],
         "title": "MetricGroup",
         "type": "string"
      },
      "MetricThreshold": {
         "description": "The class that defines the threshold for a metric.",
         "properties": {
            "type": {
               "description": "Threshold type. One of 'lower_limit', 'upper_limit'",
               "enum": [
                  "lower_limit",
                  "upper_limit"
               ],
               "title": "Type",
               "type": "string"
            },
            "value": {
               "default": 0,
               "description": "The value of metric threshold",
               "title": "Threshold value",
               "type": "number"
            }
         },
         "required": [
            "type"
         ],
         "title": "MetricThreshold",
         "type": "object"
      }
   },
   "required": [
      "name",
      "display_name",
      "value",
      "applies_to",
      "message_id"
   ]
}

Config:

arbitrary_types_allowed: bool = True
use_enum_values: bool = True

Fields:

applies_to (Annotated[str, FieldInfo(annotation=NoneType, required=True, description='The type of component the metric result applies to.', examples=['conversation', 'message', 'node'])])
conversation_id (Annotated[str | None, FieldInfo(annotation=NoneType, required=False, default=None, description='The ID of the conversation containing the message.')])
execution_count (Annotated[int | None, FieldInfo(annotation=NoneType, required=False, default=None, title='Execution count', description='The execution count of the node in a message.')])
execution_order (Annotated[int | None, FieldInfo(annotation=NoneType, required=False, default=None, title='Execution order', description='The execution order number in the sequence of nodes executed in a message.')])
id (Annotated[str, FieldInfo(annotation=NoneType, required=False, default_factory=
is_violated (Annotated[int | None, FieldInfo(annotation=NoneType, required=False, default=None, title='Is Violated', description='Indicates whether the metric threshold is violated or not. For numeric metric, its set to 1 if the metric value violates the defined threshold lower or upper limit and 0 otherwise. For categorical metric, its set to 1 if the metric value belongs to unfavourable category and 0 otherwise.')])
message_id (Annotated[str | None, FieldInfo(annotation=NoneType, required=True, description='The ID of the message being evaluated.')])
message_ts (Annotated[datetime.datetime | None, FieldInfo(annotation=NoneType, required=False, default=None, description='The timestamp of the message being evaluated.')])
node_name (Annotated[str | None, FieldInfo(annotation=NoneType, required=False, default=None, description='The name of the node being evaluated.')])
ts (Annotated[datetime.datetime, FieldInfo(annotation=NoneType, required=False, default_factory=now, description='The timestamp when the metric was recorded.')])

Validators:

validate_is_violated » all fields

field applies_to: Annotated[str, FieldInfo(annotation=NoneType, required=True, description='The type of component the metric result applies to.', examples=['conversation', 'message', 'node'])] [Required]¶

The type of component the metric result applies to.

Validated by:

validate_is_violated

field conversation_id: Annotated[str | None, FieldInfo(annotation=NoneType, required=False, default=None, description='The ID of the conversation containing the message.')] = None¶

The ID of the conversation containing the message.

Validated by:

validate_is_violated

field execution_count: Annotated[int | None, FieldInfo(annotation=NoneType, required=False, default=None, title='Execution count', description='The execution count of the node in a message.')] = None¶

The execution count of the node in a message.

Validated by:

validate_is_violated

field execution_order: Annotated[int | None, FieldInfo(annotation=NoneType, required=False, default=None, title='Execution order', description='The execution order number in the sequence of nodes executed in a message.')] = None¶

The execution order number in the sequence of nodes executed in a message.

Validated by:

validate_is_violated

field id: ')] [Optional]¶

The unique identifier for the metric result record. UUID.

Validated by:

validate_is_violated

field is_violated: Annotated[int | None, FieldInfo(annotation=NoneType, required=False, default=None, title='Is Violated', description='Indicates whether the metric threshold is violated or not. For numeric metric, its set to 1 if the metric value violates the defined threshold lower or upper limit and 0 otherwise. For categorical metric, its set to 1 if the metric value belongs to unfavourable category and 0 otherwise.')] = None¶

Indicates whether the metric threshold is violated or not. For numeric metric, its set to 1 if the metric value violates the defined threshold lower or upper limit and 0 otherwise. For categorical metric, its set to 1 if the metric value belongs to unfavourable category and 0 otherwise.

Validated by:

validate_is_violated

field message_id: Annotated[str | None, FieldInfo(annotation=NoneType, required=True, description='The ID of the message being evaluated.')] [Required]¶

The ID of the message being evaluated.

Validated by:

validate_is_violated

field message_ts: Annotated[datetime | None, FieldInfo(annotation=NoneType, required=False, default=None, description='The timestamp of the message being evaluated.')] = None¶

The timestamp of the message being evaluated.

Validated by:

validate_is_violated

field node_name: Annotated[str | None, FieldInfo(annotation=NoneType, required=False, default=None, description='The name of the node being evaluated.')] = None¶

The name of the node being evaluated.

Validated by:

validate_is_violated

field ts: Annotated[datetime, FieldInfo(annotation=NoneType, required=False, default_factory=now, description='The timestamp when the metric was recorded.')] [Optional]¶

The timestamp when the metric was recorded.

Validated by:

validate_is_violated

check_violated_record() → Any¶

Helper to check if a metric value violates any of the defined thresholds.

Returns:

Returns 1 if the value violates any threshold, 0 if it does not violate any,: and None if the value_type is unsupported or thresholds are not defined.

Return type:

int|None

validator validate_is_violated » all fields¶

pydantic model ibm_watsonx_gov.entities.evaluation_result.AggregateAgentMetricResult¶

Bases: BaseMetricResult

Show JSON schema

{
   "title": "AggregateAgentMetricResult",
   "type": "object",
   "properties": {
      "name": {
         "description": "The name of the metric.",
         "examples": [
            "answer_relevance",
            "context_relevance"
         ],
         "title": "Metric Name",
         "type": "string"
      },
      "display_name": {
         "description": "The display name of the metric.",
         "examples": [
            "Answer Relevance",
            "Context Relevance"
         ],
         "title": "Metric display name",
         "type": "string"
      },
      "value_type": {
         "default": "numeric",
         "description": "The type of the metric value. Indicates whether the metric value is numeric or categorical.",
         "examples": [
            "numeric",
            "categorical"
         ],
         "title": "Metric value type",
         "type": "string"
      },
      "method": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The method used to compute this metric result.",
         "examples": [
            "token_recall"
         ],
         "title": "Method"
      },
      "provider": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The provider used to compute this metric result.",
         "title": "Provider"
      },
      "value": {
         "anyOf": [
            {
               "type": "number"
            },
            {
               "additionalProperties": {
                  "type": "integer"
               },
               "type": "object"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The value of the metric. Defaults to mean for numeric metric types. For categorical metric types, this has the frequency distribution of non-null categories.",
         "title": "Value"
      },
      "label": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The string equivalent of the metric value. This is used for metrics with categorical value type.",
         "title": "Label"
      },
      "errors": {
         "anyOf": [
            {
               "items": {
                  "$ref": "#/$defs/Error"
               },
               "type": "array"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The list of error messages",
         "title": "Errors"
      },
      "additional_info": {
         "anyOf": [
            {
               "type": "object"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The additional information about the metric result.",
         "title": "Additional Info"
      },
      "explanation": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The explanation about the metric result.",
         "title": "Explanation"
      },
      "group": {
         "anyOf": [
            {
               "$ref": "#/$defs/MetricGroup"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The metric group",
         "title": "Group"
      },
      "thresholds": {
         "default": [],
         "description": "The metric thresholds",
         "items": {
            "$ref": "#/$defs/MetricThreshold"
         },
         "title": "Thresholds",
         "type": "array"
      },
      "min": {
         "anyOf": [
            {
               "type": "number"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The minimum value of the metric. Applicable for numeric metric types.",
         "title": "Min"
      },
      "max": {
         "anyOf": [
            {
               "type": "number"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The maximum value of the metric. Applicable for numeric metric types.",
         "title": "Max"
      },
      "mean": {
         "anyOf": [
            {
               "type": "number"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The mean value of the metric. Applicable for numeric metric types.",
         "title": "Mean"
      },
      "percentiles": {
         "anyOf": [
            {
               "additionalProperties": {
                  "type": "number"
               },
               "type": "object"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Dictionary of percentile values (25th, 50th, 75th, 90th, 95th, 99th) of the metric. Applicable for numeric metric types.",
         "title": "Percentiles"
      },
      "unique": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The distinct count of the string values found. Applicable for categorical metric types.",
         "title": "Unique"
      },
      "count": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The count for metric results used for aggregation.",
         "title": "Count"
      },
      "node_name": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The name of the node being evaluated.",
         "title": "Node Name"
      },
      "applies_to": {
         "description": "The type of component the metric result applies to.",
         "examples": [
            "conversation",
            "message",
            "node"
         ],
         "title": "Applies To",
         "type": "string"
      },
      "individual_results": {
         "default": [],
         "description": "The list individual metric results.",
         "items": {
            "$ref": "#/$defs/AgentMetricResult"
         },
         "title": "Individual Results",
         "type": "array"
      },
      "violations_count": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The count of records that violated the defined thresholds.",
         "title": "Violations Count"
      }
   },
   "$defs": {
      "AgentMetricResult": {
         "description": "This is the data model for metric results in the agentic app.\nIt stores evaluation results for conversations, messages and nodes.",
         "properties": {
            "name": {
               "description": "The name of the metric.",
               "examples": [
                  "answer_relevance",
                  "context_relevance"
               ],
               "title": "Metric Name",
               "type": "string"
            },
            "display_name": {
               "description": "The display name of the metric.",
               "examples": [
                  "Answer Relevance",
                  "Context Relevance"
               ],
               "title": "Metric display name",
               "type": "string"
            },
            "value_type": {
               "default": "numeric",
               "description": "The type of the metric value. Indicates whether the metric value is numeric or categorical.",
               "examples": [
                  "numeric",
                  "categorical"
               ],
               "title": "Metric value type",
               "type": "string"
            },
            "method": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The method used to compute this metric result.",
               "examples": [
                  "token_recall"
               ],
               "title": "Method"
            },
            "provider": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The provider used to compute this metric result.",
               "title": "Provider"
            },
            "value": {
               "anyOf": [
                  {
                     "type": "number"
                  },
                  {
                     "type": "string"
                  },
                  {
                     "type": "boolean"
                  },
                  {
                     "additionalProperties": {
                        "type": "integer"
                     },
                     "type": "object"
                  },
                  {
                     "type": "null"
                  }
               ],
               "description": "The metric value.",
               "title": "Value"
            },
            "label": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The string equivalent of the metric value. This is used for metrics with categorical value type.",
               "title": "Label"
            },
            "errors": {
               "anyOf": [
                  {
                     "items": {
                        "$ref": "#/$defs/Error"
                     },
                     "type": "array"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The list of error messages",
               "title": "Errors"
            },
            "additional_info": {
               "anyOf": [
                  {
                     "type": "object"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The additional information about the metric result.",
               "title": "Additional Info"
            },
            "explanation": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The explanation about the metric result.",
               "title": "Explanation"
            },
            "group": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/MetricGroup"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The metric group",
               "title": "Group"
            },
            "thresholds": {
               "default": [],
               "description": "The metric thresholds",
               "items": {
                  "$ref": "#/$defs/MetricThreshold"
               },
               "title": "Thresholds",
               "type": "array"
            },
            "id": {
               "description": "The unique identifier for the metric result record. UUID.",
               "title": "Id",
               "type": "string"
            },
            "ts": {
               "description": "The timestamp when the metric was recorded.",
               "format": "date-time",
               "title": "Ts",
               "type": "string"
            },
            "applies_to": {
               "description": "The type of component the metric result applies to.",
               "examples": [
                  "conversation",
                  "message",
                  "node"
               ],
               "title": "Applies To",
               "type": "string"
            },
            "message_id": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "description": "The ID of the message being evaluated.",
               "title": "Message Id"
            },
            "message_ts": {
               "anyOf": [
                  {
                     "format": "date-time",
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The timestamp of the message being evaluated.",
               "title": "Message Ts"
            },
            "conversation_id": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The ID of the conversation containing the message.",
               "title": "Conversation Id"
            },
            "node_name": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The name of the node being evaluated.",
               "title": "Node Name"
            },
            "execution_count": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The execution count of the node in a message.",
               "title": "Execution count"
            },
            "execution_order": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The execution order number in the sequence of nodes executed in a message.",
               "title": "Execution order"
            },
            "is_violated": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Indicates whether the metric threshold is violated or not. For numeric metric, its set to 1 if the metric value violates the defined threshold lower or upper limit and 0 otherwise. For categorical metric, its set to 1 if the metric value belongs to unfavourable category and 0 otherwise.",
               "title": "Is Violated"
            }
         },
         "required": [
            "name",
            "display_name",
            "value",
            "applies_to",
            "message_id"
         ],
         "title": "AgentMetricResult",
         "type": "object"
      },
      "Error": {
         "properties": {
            "code": {
               "description": "The error code",
               "title": "Code",
               "type": "string"
            },
            "message_en": {
               "description": "The error message in English.",
               "title": "Message En",
               "type": "string"
            },
            "parameters": {
               "default": [],
               "description": "The list of parameters to construct the message in a different locale.",
               "items": {},
               "title": "Parameters",
               "type": "array"
            }
         },
         "required": [
            "code",
            "message_en"
         ],
         "title": "Error",
         "type": "object"
      },
      "MetricGroup": {
         "enum": [
            "retrieval_quality",
            "answer_quality",
            "content_safety",
            "performance",
            "usage",
            "message_completion",
            "tool_call_quality",
            "readability",
            "custom"
         ],
         "title": "MetricGroup",
         "type": "string"
      },
      "MetricThreshold": {
         "description": "The class that defines the threshold for a metric.",
         "properties": {
            "type": {
               "description": "Threshold type. One of 'lower_limit', 'upper_limit'",
               "enum": [
                  "lower_limit",
                  "upper_limit"
               ],
               "title": "Type",
               "type": "string"
            },
            "value": {
               "default": 0,
               "description": "The value of metric threshold",
               "title": "Threshold value",
               "type": "number"
            }
         },
         "required": [
            "type"
         ],
         "title": "MetricThreshold",
         "type": "object"
      }
   },
   "required": [
      "name",
      "display_name",
      "applies_to"
   ]
}

Config:

arbitrary_types_allowed: bool = True
use_enum_values: bool = True

Fields:

applies_to (Annotated[str, FieldInfo(annotation=NoneType, required=True, description='The type of component the metric result applies to.', examples=['conversation', 'message', 'node'])])
count (Annotated[int | None, FieldInfo(annotation=NoneType, required=False, default=None, description='The count for metric results used for aggregation.')])
individual_results (Annotated[list[ibm_watsonx_gov.entities.evaluation_result.AgentMetricResult], FieldInfo(annotation=NoneType, required=False, default=[], description='The list individual metric results.')])
max (Annotated[float | None, FieldInfo(annotation=NoneType, required=False, default=None, description='The maximum value of the metric. Applicable for numeric metric types.')])
mean (Annotated[float | None, FieldInfo(annotation=NoneType, required=False, default=None, description='The mean value of the metric. Applicable for numeric metric types.')])
min (Annotated[float | None, FieldInfo(annotation=NoneType, required=False, default=None, description='The minimum value of the metric. Applicable for numeric metric types.')])
node_name (Annotated[str | None, FieldInfo(annotation=NoneType, required=False, default=None, description='The name of the node being evaluated.')])
percentiles (Annotated[dict[str, float] | None, FieldInfo(annotation=NoneType, required=False, default=None, description='Dictionary of percentile values (25th, 50th, 75th, 90th, 95th, 99th) of the metric. Applicable for numeric metric types.')])
unique (Annotated[int | None, FieldInfo(annotation=NoneType, required=False, default=None, description='The distinct count of the string values found. Applicable for categorical metric types.')])
value (Annotated[float | dict[str, int] | None, FieldInfo(annotation=NoneType, required=False, default=None, description='The value of the metric. Defaults to mean for numeric metric types. For categorical metric types, this has the frequency distribution of non-null categories.')])
violations_count (Annotated[int | None, FieldInfo(annotation=NoneType, required=False, default=None, description='The count of records that violated the defined thresholds.')])

Validators:

validate_violations_count » all fields

The type of component the metric result applies to.

Validated by:

validate_violations_count

field count: Annotated[int | None, FieldInfo(annotation=NoneType, required=False, default=None, description='The count for metric results used for aggregation.')] = None¶

The count for metric results used for aggregation.

Validated by:

validate_violations_count

field individual_results: Annotated[list[AgentMetricResult], FieldInfo(annotation=NoneType, required=False, default=[], description='The list individual metric results.')] = []¶

The list individual metric results.

Validated by:

validate_violations_count

field max: Annotated[float | None, FieldInfo(annotation=NoneType, required=False, default=None, description='The maximum value of the metric. Applicable for numeric metric types.')] = None¶

The maximum value of the metric. Applicable for numeric metric types.

Validated by:

validate_violations_count

field mean: Annotated[float | None, FieldInfo(annotation=NoneType, required=False, default=None, description='The mean value of the metric. Applicable for numeric metric types.')] = None¶

The mean value of the metric. Applicable for numeric metric types.

Validated by:

validate_violations_count

field min: Annotated[float | None, FieldInfo(annotation=NoneType, required=False, default=None, description='The minimum value of the metric. Applicable for numeric metric types.')] = None¶

The minimum value of the metric. Applicable for numeric metric types.

Validated by:

validate_violations_count

field node_name: Annotated[str | None, FieldInfo(annotation=NoneType, required=False, default=None, description='The name of the node being evaluated.')] = None¶

The name of the node being evaluated.

Validated by:

validate_violations_count

field percentiles: Annotated[dict[str, float] | None, FieldInfo(annotation=NoneType, required=False, default=None, description='Dictionary of percentile values (25th, 50th, 75th, 90th, 95th, 99th) of the metric. Applicable for numeric metric types.')] = None¶

Dictionary of percentile values (25th, 50th, 75th, 90th, 95th, 99th) of the metric. Applicable for numeric metric types.

Validated by:

validate_violations_count

field unique: Annotated[int | None, FieldInfo(annotation=NoneType, required=False, default=None, description='The distinct count of the string values found. Applicable for categorical metric types.')] = None¶

The distinct count of the string values found. Applicable for categorical metric types.

Validated by:

validate_violations_count

field value: Annotated[float | dict[str, int] | None, FieldInfo(annotation=NoneType, required=False, default=None, description='The value of the metric. Defaults to mean for numeric metric types. For categorical metric types, this has the frequency distribution of non-null categories.')] = None¶

The value of the metric. Defaults to mean for numeric metric types. For categorical metric types, this has the frequency distribution of non-null categories.

Validated by:

validate_violations_count

field violations_count: Annotated[int | None, FieldInfo(annotation=NoneType, required=False, default=None, description='The count of records that violated the defined thresholds.')] = None¶

The count of records that violated the defined thresholds.

Validated by:

validate_violations_count

validator validate_violations_count » all fields¶

pydantic model ibm_watsonx_gov.entities.evaluation_result.AggregateMetricResult¶

Bases: BaseMetricResult

Show JSON schema

{
   "title": "AggregateMetricResult",
   "type": "object",
   "properties": {
      "name": {
         "description": "The name of the metric.",
         "examples": [
            "answer_relevance",
            "context_relevance"
         ],
         "title": "Metric Name",
         "type": "string"
      },
      "display_name": {
         "description": "The display name of the metric.",
         "examples": [
            "Answer Relevance",
            "Context Relevance"
         ],
         "title": "Metric display name",
         "type": "string"
      },
      "value_type": {
         "default": "numeric",
         "description": "The type of the metric value. Indicates whether the metric value is numeric or categorical.",
         "examples": [
            "numeric",
            "categorical"
         ],
         "title": "Metric value type",
         "type": "string"
      },
      "method": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The method used to compute this metric result.",
         "examples": [
            "token_recall"
         ],
         "title": "Method"
      },
      "provider": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The provider used to compute this metric result.",
         "title": "Provider"
      },
      "value": {
         "anyOf": [
            {
               "type": "number"
            },
            {
               "type": "string"
            },
            {
               "type": "boolean"
            },
            {
               "additionalProperties": {
                  "type": "integer"
               },
               "type": "object"
            },
            {
               "type": "null"
            }
         ],
         "description": "The metric value.",
         "title": "Value"
      },
      "label": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The string equivalent of the metric value. This is used for metrics with categorical value type.",
         "title": "Label"
      },
      "errors": {
         "anyOf": [
            {
               "items": {
                  "$ref": "#/$defs/Error"
               },
               "type": "array"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The list of error messages",
         "title": "Errors"
      },
      "additional_info": {
         "anyOf": [
            {
               "type": "object"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The additional information about the metric result.",
         "title": "Additional Info"
      },
      "explanation": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The explanation about the metric result.",
         "title": "Explanation"
      },
      "group": {
         "anyOf": [
            {
               "$ref": "#/$defs/MetricGroup"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The metric group",
         "title": "Group"
      },
      "thresholds": {
         "default": [],
         "description": "The metric thresholds",
         "items": {
            "$ref": "#/$defs/MetricThreshold"
         },
         "title": "Thresholds",
         "type": "array"
      },
      "min": {
         "anyOf": [
            {
               "type": "number"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Min"
      },
      "max": {
         "anyOf": [
            {
               "type": "number"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Max"
      },
      "mean": {
         "anyOf": [
            {
               "type": "number"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Mean"
      },
      "total_records": {
         "title": "Total Records",
         "type": "integer"
      },
      "labels_count": {
         "anyOf": [
            {
               "type": "object"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Labels Count"
      },
      "record_level_metrics": {
         "default": [],
         "items": {
            "$ref": "#/$defs/RecordMetricResult"
         },
         "title": "Record Level Metrics",
         "type": "array"
      }
   },
   "$defs": {
      "Error": {
         "properties": {
            "code": {
               "description": "The error code",
               "title": "Code",
               "type": "string"
            },
            "message_en": {
               "description": "The error message in English.",
               "title": "Message En",
               "type": "string"
            },
            "parameters": {
               "default": [],
               "description": "The list of parameters to construct the message in a different locale.",
               "items": {},
               "title": "Parameters",
               "type": "array"
            }
         },
         "required": [
            "code",
            "message_en"
         ],
         "title": "Error",
         "type": "object"
      },
      "MetricGroup": {
         "enum": [
            "retrieval_quality",
            "answer_quality",
            "content_safety",
            "performance",
            "usage",
            "message_completion",
            "tool_call_quality",
            "readability",
            "custom"
         ],
         "title": "MetricGroup",
         "type": "string"
      },
      "MetricThreshold": {
         "description": "The class that defines the threshold for a metric.",
         "properties": {
            "type": {
               "description": "Threshold type. One of 'lower_limit', 'upper_limit'",
               "enum": [
                  "lower_limit",
                  "upper_limit"
               ],
               "title": "Type",
               "type": "string"
            },
            "value": {
               "default": 0,
               "description": "The value of metric threshold",
               "title": "Threshold value",
               "type": "number"
            }
         },
         "required": [
            "type"
         ],
         "title": "MetricThreshold",
         "type": "object"
      },
      "RecordMetricResult": {
         "properties": {
            "name": {
               "description": "The name of the metric.",
               "examples": [
                  "answer_relevance",
                  "context_relevance"
               ],
               "title": "Metric Name",
               "type": "string"
            },
            "display_name": {
               "description": "The display name of the metric.",
               "examples": [
                  "Answer Relevance",
                  "Context Relevance"
               ],
               "title": "Metric display name",
               "type": "string"
            },
            "value_type": {
               "default": "numeric",
               "description": "The type of the metric value. Indicates whether the metric value is numeric or categorical.",
               "examples": [
                  "numeric",
                  "categorical"
               ],
               "title": "Metric value type",
               "type": "string"
            },
            "method": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The method used to compute this metric result.",
               "examples": [
                  "token_recall"
               ],
               "title": "Method"
            },
            "provider": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The provider used to compute this metric result.",
               "title": "Provider"
            },
            "value": {
               "anyOf": [
                  {
                     "type": "number"
                  },
                  {
                     "type": "string"
                  },
                  {
                     "type": "boolean"
                  },
                  {
                     "additionalProperties": {
                        "type": "integer"
                     },
                     "type": "object"
                  },
                  {
                     "type": "null"
                  }
               ],
               "description": "The metric value.",
               "title": "Value"
            },
            "label": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The string equivalent of the metric value. This is used for metrics with categorical value type.",
               "title": "Label"
            },
            "errors": {
               "anyOf": [
                  {
                     "items": {
                        "$ref": "#/$defs/Error"
                     },
                     "type": "array"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The list of error messages",
               "title": "Errors"
            },
            "additional_info": {
               "anyOf": [
                  {
                     "type": "object"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The additional information about the metric result.",
               "title": "Additional Info"
            },
            "explanation": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The explanation about the metric result.",
               "title": "Explanation"
            },
            "group": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/MetricGroup"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The metric group",
               "title": "Group"
            },
            "thresholds": {
               "default": [],
               "description": "The metric thresholds",
               "items": {
                  "$ref": "#/$defs/MetricThreshold"
               },
               "title": "Thresholds",
               "type": "array"
            },
            "record_id": {
               "description": "The record identifier.",
               "examples": [
                  "record1"
               ],
               "title": "Record Id",
               "type": "string"
            },
            "record_timestamp": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The record timestamp.",
               "examples": [
                  "2025-01-01T00:00:00.000000Z"
               ],
               "title": "Record Timestamp"
            }
         },
         "required": [
            "name",
            "display_name",
            "value",
            "record_id"
         ],
         "title": "RecordMetricResult",
         "type": "object"
      }
   },
   "required": [
      "name",
      "display_name",
      "value",
      "total_records"
   ]
}

Config:

arbitrary_types_allowed: bool = True
use_enum_values: bool = True

Fields:

labels_count (dict | None)
max (float | None)
mean (float | None)
min (float | None)
record_level_metrics (list[ibm_watsonx_gov.entities.evaluation_result.RecordMetricResult])
total_records (int)

field labels_count: dict | None = None¶

field max: float | None = None¶

field mean: float | None = None¶

field min: float | None = None¶

field record_level_metrics: list[RecordMetricResult] = []¶

field total_records: int [Required]¶

static create(results: list[RecordMetricResult]) → Self | None¶

pydantic model ibm_watsonx_gov.entities.evaluation_result.MessageData¶

Bases: BaseModel

The model class to capture the message input output data for an agent.

Show JSON schema

{
   "title": "MessageData",
   "description": "The model class to capture the message input output data for an agent.",
   "type": "object",
   "properties": {
      "message_id": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "description": "The ID of the message.",
         "title": "Message ID"
      },
      "message_ts": {
         "anyOf": [
            {
               "format": "date-time",
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "description": "The timestamp of the message in ISO format. The end timestamp of the message processing is considered as the message timestamp.",
         "title": "Message timestamp"
      },
      "conversation_id": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "description": "The ID of the conversation containing the message.",
         "title": "Conversation ID"
      },
      "start_time": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "description": "The message execution start time in ISO format.",
         "title": "Start time"
      },
      "end_time": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "description": "The message excution end time in ISO format.",
         "title": "End time"
      },
      "input": {
         "anyOf": [
            {
               "type": "object"
            },
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "description": "The message input data.",
         "title": "Input"
      },
      "output": {
         "anyOf": [
            {
               "type": "object"
            },
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "description": "The message output data.",
         "title": "Input"
      },
      "num_loops": {
         "default": 0,
         "description": "The number of loops occurred in the agent while generating the output.",
         "title": "Number of Loops",
         "type": "integer"
      }
   },
   "required": [
      "message_id",
      "message_ts",
      "conversation_id",
      "start_time",
      "end_time",
      "input",
      "output"
   ]
}

Fields:

conversation_id (str | None)
end_time (str | None)
input (dict | str | None)
message_id (str | None)
message_ts (datetime.datetime | None)
num_loops (int)
output (dict | str | None)
start_time (str | None)

field conversation_id: Annotated[str | None, FieldInfo(annotation=NoneType, required=True, title='Conversation ID', description='The ID of the conversation containing the message.')] [Required]¶: The ID of the conversation containing the message.

field end_time: Annotated[str | None, FieldInfo(annotation=NoneType, required=True, title='End time', description='The message excution end time in ISO format.')] [Required]¶: The message excution end time in ISO format.

field input: Annotated[dict | str | None, FieldInfo(annotation=NoneType, required=True, title='Input', description='The message input data.')] [Required]¶: The message input data.

field message_id: Annotated[str | None, FieldInfo(annotation=NoneType, required=True, title='Message ID', description='The ID of the message.')] [Required]¶: The ID of the message.

field message_ts: Annotated[datetime | None, FieldInfo(annotation=NoneType, required=True, title='Message timestamp', description='The timestamp of the message in ISO format. The end timestamp of the message processing is considered as the message timestamp.')] [Required]¶: The timestamp of the message in ISO format. The end timestamp of the message processing is considered as the message timestamp.

field num_loops: Annotated[int, FieldInfo(annotation=NoneType, required=False, default=0, title='Number of Loops', description='The number of loops occurred in the agent while generating the output.')] = 0¶: The number of loops occurred in the agent while generating the output.

field output: Annotated[dict | str | None, FieldInfo(annotation=NoneType, required=True, title='Input', description='The message output data.')] [Required]¶: The message output data.

field start_time: Annotated[str | None, FieldInfo(annotation=NoneType, required=True, title='Start time', description='The message execution start time in ISO format.')] [Required]¶: The message execution start time in ISO format.

pydantic model ibm_watsonx_gov.entities.evaluation_result.MetricMapping¶

Bases: BaseModel

The metric mapping data

Show JSON schema

{
   "title": "MetricMapping",
   "description": "The metric mapping data",
   "type": "object",
   "properties": {
      "name": {
         "description": "The name of the metric.",
         "title": "Name",
         "type": "string"
      },
      "method": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The method used to compute the metric.",
         "title": "Method"
      },
      "applies_to": {
         "title": "Applies To",
         "type": "string"
      },
      "mapping": {
         "anyOf": [
            {
               "$ref": "#/$defs/Mapping"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The data mapping details for the metric which are used to read the values needed to compute the metric.",
         "title": "Mapping"
      }
   },
   "$defs": {
      "Mapping": {
         "description": "Defines the field mapping details to be used for computing a metric.",
         "properties": {
            "source": {
               "default": "trace",
               "description": "The source type of the data. Use trace if the data should be read from span in trace. Use tabular if the data is passed as a dataframe.",
               "enum": [
                  "trace",
                  "tabular"
               ],
               "examples": [
                  "trace",
                  "tabular"
               ],
               "title": "Source",
               "type": "string"
            },
            "items": {
               "description": "The list of mapping items for the field. They are used to read the data from trace or tabular data for computing the metric.",
               "items": {
                  "$ref": "#/$defs/MappingItem"
               },
               "title": "Mapping Items",
               "type": "array"
            }
         },
         "required": [
            "items"
         ],
         "title": "Mapping",
         "type": "object"
      },
      "MappingItem": {
         "description": "The mapping details to be used for reading the values from the data.",
         "properties": {
            "name": {
               "description": "The name of the item.",
               "examples": [
                  "input_text",
                  "generated_text",
                  "context",
                  "ground_truth"
               ],
               "title": "Name",
               "type": "string"
            },
            "type": {
               "description": "The type of the item.",
               "enum": [
                  "input",
                  "output",
                  "reference",
                  "context",
                  "tool_call"
               ],
               "examples": [
                  "input"
               ],
               "title": "Type",
               "type": "string"
            },
            "column_name": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The column name in the tabular data to be used for reading the field value. Applicable for tabular source.",
               "title": "Column Name"
            },
            "span_name": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The span name in the trace data to be used for reading the field value. Applicable for trace source.",
               "title": "Span Name"
            },
            "attribute_name": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The attribute name in the trace to be used for reading the field value. Applicable for trace source.",
               "title": "Attribute Name"
            },
            "json_path": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The json path to be used for reading the field value from the attribute value. Applicable for trace source. If not provided, the span attribute value is read as the field value.",
               "title": "Json Path"
            },
            "lookup_child_spans": {
               "anyOf": [
                  {
                     "type": "boolean"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": false,
               "description": "The flag to indicate if all the child spans should be searched for the attribute value. Applicable for trace source.",
               "title": "Look up child spans"
            }
         },
         "required": [
            "name",
            "type"
         ],
         "title": "MappingItem",
         "type": "object"
      }
   },
   "required": [
      "name"
   ]
}

Fields:

applies_to (str)
mapping (ibm_watsonx_gov.entities.metric.Mapping | None)
method (str | None)
name (str)

field applies_to: Annotated[str, FieldInfo(annotation=NoneType, required=False, default=FieldInfo(annotation=str, required=False, default='message', title='Applies to', description='The tag to indicate for which the metric is applied to. Used for agentic application metric computation.', examples=['message', 'conversation', 'sub_agent']))] = FieldInfo(annotation=str, required=False, default='message', title='Applies to', description='The tag to indicate for which the metric is applied to. Used for agentic application metric computation.', examples=['message', 'conversation', 'sub_agent'])¶

field mapping: Annotated[Mapping | None, FieldInfo(annotation=NoneType, required=False, default=None, title='Mapping', description='The data mapping details for the metric which are used to read the values needed to compute the metric.')] = None¶: The data mapping details for the metric which are used to read the values needed to compute the metric.

field method: Annotated[str | None, FieldInfo(annotation=NoneType, required=False, default=None, title='Method', description='The method used to compute the metric.')] = None¶: The method used to compute the metric.

field name: Annotated[str, FieldInfo(annotation=NoneType, required=True, title='Name', description='The name of the metric.')] [Required]¶: The name of the metric.

pydantic model ibm_watsonx_gov.entities.evaluation_result.MetricsEvaluationResult¶

Bases: BaseModel

Show JSON schema

{
   "title": "MetricsEvaluationResult",
   "type": "object",
   "properties": {
      "metrics_result": {
         "items": {
            "$ref": "#/$defs/AggregateMetricResult"
         },
         "title": "Metrics Result",
         "type": "array"
      }
   },
   "$defs": {
      "AggregateMetricResult": {
         "properties": {
            "name": {
               "description": "The name of the metric.",
               "examples": [
                  "answer_relevance",
                  "context_relevance"
               ],
               "title": "Metric Name",
               "type": "string"
            },
            "display_name": {
               "description": "The display name of the metric.",
               "examples": [
                  "Answer Relevance",
                  "Context Relevance"
               ],
               "title": "Metric display name",
               "type": "string"
            },
            "value_type": {
               "default": "numeric",
               "description": "The type of the metric value. Indicates whether the metric value is numeric or categorical.",
               "examples": [
                  "numeric",
                  "categorical"
               ],
               "title": "Metric value type",
               "type": "string"
            },
            "method": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The method used to compute this metric result.",
               "examples": [
                  "token_recall"
               ],
               "title": "Method"
            },
            "provider": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The provider used to compute this metric result.",
               "title": "Provider"
            },
            "value": {
               "anyOf": [
                  {
                     "type": "number"
                  },
                  {
                     "type": "string"
                  },
                  {
                     "type": "boolean"
                  },
                  {
                     "additionalProperties": {
                        "type": "integer"
                     },
                     "type": "object"
                  },
                  {
                     "type": "null"
                  }
               ],
               "description": "The metric value.",
               "title": "Value"
            },
            "label": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The string equivalent of the metric value. This is used for metrics with categorical value type.",
               "title": "Label"
            },
            "errors": {
               "anyOf": [
                  {
                     "items": {
                        "$ref": "#/$defs/Error"
                     },
                     "type": "array"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The list of error messages",
               "title": "Errors"
            },
            "additional_info": {
               "anyOf": [
                  {
                     "type": "object"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The additional information about the metric result.",
               "title": "Additional Info"
            },
            "explanation": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The explanation about the metric result.",
               "title": "Explanation"
            },
            "group": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/MetricGroup"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The metric group",
               "title": "Group"
            },
            "thresholds": {
               "default": [],
               "description": "The metric thresholds",
               "items": {
                  "$ref": "#/$defs/MetricThreshold"
               },
               "title": "Thresholds",
               "type": "array"
            },
            "min": {
               "anyOf": [
                  {
                     "type": "number"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Min"
            },
            "max": {
               "anyOf": [
                  {
                     "type": "number"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Max"
            },
            "mean": {
               "anyOf": [
                  {
                     "type": "number"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Mean"
            },
            "total_records": {
               "title": "Total Records",
               "type": "integer"
            },
            "labels_count": {
               "anyOf": [
                  {
                     "type": "object"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Labels Count"
            },
            "record_level_metrics": {
               "default": [],
               "items": {
                  "$ref": "#/$defs/RecordMetricResult"
               },
               "title": "Record Level Metrics",
               "type": "array"
            }
         },
         "required": [
            "name",
            "display_name",
            "value",
            "total_records"
         ],
         "title": "AggregateMetricResult",
         "type": "object"
      },
      "Error": {
         "properties": {
            "code": {
               "description": "The error code",
               "title": "Code",
               "type": "string"
            },
            "message_en": {
               "description": "The error message in English.",
               "title": "Message En",
               "type": "string"
            },
            "parameters": {
               "default": [],
               "description": "The list of parameters to construct the message in a different locale.",
               "items": {},
               "title": "Parameters",
               "type": "array"
            }
         },
         "required": [
            "code",
            "message_en"
         ],
         "title": "Error",
         "type": "object"
      },
      "MetricGroup": {
         "enum": [
            "retrieval_quality",
            "answer_quality",
            "content_safety",
            "performance",
            "usage",
            "message_completion",
            "tool_call_quality",
            "readability",
            "custom"
         ],
         "title": "MetricGroup",
         "type": "string"
      },
      "MetricThreshold": {
         "description": "The class that defines the threshold for a metric.",
         "properties": {
            "type": {
               "description": "Threshold type. One of 'lower_limit', 'upper_limit'",
               "enum": [
                  "lower_limit",
                  "upper_limit"
               ],
               "title": "Type",
               "type": "string"
            },
            "value": {
               "default": 0,
               "description": "The value of metric threshold",
               "title": "Threshold value",
               "type": "number"
            }
         },
         "required": [
            "type"
         ],
         "title": "MetricThreshold",
         "type": "object"
      },
      "RecordMetricResult": {
         "properties": {
            "name": {
               "description": "The name of the metric.",
               "examples": [
                  "answer_relevance",
                  "context_relevance"
               ],
               "title": "Metric Name",
               "type": "string"
            },
            "display_name": {
               "description": "The display name of the metric.",
               "examples": [
                  "Answer Relevance",
                  "Context Relevance"
               ],
               "title": "Metric display name",
               "type": "string"
            },
            "value_type": {
               "default": "numeric",
               "description": "The type of the metric value. Indicates whether the metric value is numeric or categorical.",
               "examples": [
                  "numeric",
                  "categorical"
               ],
               "title": "Metric value type",
               "type": "string"
            },
            "method": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The method used to compute this metric result.",
               "examples": [
                  "token_recall"
               ],
               "title": "Method"
            },
            "provider": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The provider used to compute this metric result.",
               "title": "Provider"
            },
            "value": {
               "anyOf": [
                  {
                     "type": "number"
                  },
                  {
                     "type": "string"
                  },
                  {
                     "type": "boolean"
                  },
                  {
                     "additionalProperties": {
                        "type": "integer"
                     },
                     "type": "object"
                  },
                  {
                     "type": "null"
                  }
               ],
               "description": "The metric value.",
               "title": "Value"
            },
            "label": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The string equivalent of the metric value. This is used for metrics with categorical value type.",
               "title": "Label"
            },
            "errors": {
               "anyOf": [
                  {
                     "items": {
                        "$ref": "#/$defs/Error"
                     },
                     "type": "array"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The list of error messages",
               "title": "Errors"
            },
            "additional_info": {
               "anyOf": [
                  {
                     "type": "object"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The additional information about the metric result.",
               "title": "Additional Info"
            },
            "explanation": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The explanation about the metric result.",
               "title": "Explanation"
            },
            "group": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/MetricGroup"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The metric group",
               "title": "Group"
            },
            "thresholds": {
               "default": [],
               "description": "The metric thresholds",
               "items": {
                  "$ref": "#/$defs/MetricThreshold"
               },
               "title": "Thresholds",
               "type": "array"
            },
            "record_id": {
               "description": "The record identifier.",
               "examples": [
                  "record1"
               ],
               "title": "Record Id",
               "type": "string"
            },
            "record_timestamp": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The record timestamp.",
               "examples": [
                  "2025-01-01T00:00:00.000000Z"
               ],
               "title": "Record Timestamp"
            }
         },
         "required": [
            "name",
            "display_name",
            "value",
            "record_id"
         ],
         "title": "RecordMetricResult",
         "type": "object"
      }
   },
   "required": [
      "metrics_result"
   ]
}

Fields:

metrics_result (list[ibm_watsonx_gov.entities.evaluation_result.AggregateMetricResult])

field metrics_result: list[AggregateMetricResult] [Required]¶

to_df(data: DataFrame | None = None, include_additional_info: bool = False) → DataFrame¶

Transform the metrics evaluation result to a dataframe.

Parameters:

data (pd.DataFrame) – the input dataframe, when passed will be concatenated to the metrics result
include_additional_info (bool) – wether to include additional info in the metrics result

Returns:

new dataframe of the input and the evaluated metrics

Return type:

pd.DataFrame

to_dict() → list[dict]¶: Transform the metrics evaluation result to a list of dict containing the record level metrics.

to_json(indent: int | None = None, **kwargs)¶

Transform the metrics evaluation result to a json. The kwargs are passed to the model_dump_json method of pydantic model. All the arguments supported by pydantic model_dump_json can be passed.

Parameters:: indent (int, optional) – The indentation level for the json. Defaults to None.
Returns:: string of the result json.

pydantic model ibm_watsonx_gov.entities.evaluation_result.MetricsMappingData¶

Bases: BaseModel

The model class to capture the metrics mappings and the span data.

Show JSON schema

{
   "title": "MetricsMappingData",
   "description": "The model class to capture the metrics mappings and the span data.",
   "type": "object",
   "properties": {
      "message_id": {
         "description": "The ID of the message.",
         "title": "Message ID",
         "type": "string"
      },
      "metric_mappings": {
         "description": "The list of metric mappings.",
         "items": {
            "$ref": "#/$defs/MetricMapping"
         },
         "title": "Metric Mapping",
         "type": "array"
      },
      "data": {
         "description": "The span data used for metrics computation.",
         "examples": [
            {
               "LangGraph.workflow": {
                  "traceloop.entity.output": {
                     "$.outputs.generated_text": "The response"
                  }
               }
            }
         ],
         "title": "Data",
         "type": "object"
      }
   },
   "$defs": {
      "Mapping": {
         "description": "Defines the field mapping details to be used for computing a metric.",
         "properties": {
            "source": {
               "default": "trace",
               "description": "The source type of the data. Use trace if the data should be read from span in trace. Use tabular if the data is passed as a dataframe.",
               "enum": [
                  "trace",
                  "tabular"
               ],
               "examples": [
                  "trace",
                  "tabular"
               ],
               "title": "Source",
               "type": "string"
            },
            "items": {
               "description": "The list of mapping items for the field. They are used to read the data from trace or tabular data for computing the metric.",
               "items": {
                  "$ref": "#/$defs/MappingItem"
               },
               "title": "Mapping Items",
               "type": "array"
            }
         },
         "required": [
            "items"
         ],
         "title": "Mapping",
         "type": "object"
      },
      "MappingItem": {
         "description": "The mapping details to be used for reading the values from the data.",
         "properties": {
            "name": {
               "description": "The name of the item.",
               "examples": [
                  "input_text",
                  "generated_text",
                  "context",
                  "ground_truth"
               ],
               "title": "Name",
               "type": "string"
            },
            "type": {
               "description": "The type of the item.",
               "enum": [
                  "input",
                  "output",
                  "reference",
                  "context",
                  "tool_call"
               ],
               "examples": [
                  "input"
               ],
               "title": "Type",
               "type": "string"
            },
            "column_name": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The column name in the tabular data to be used for reading the field value. Applicable for tabular source.",
               "title": "Column Name"
            },
            "span_name": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The span name in the trace data to be used for reading the field value. Applicable for trace source.",
               "title": "Span Name"
            },
            "attribute_name": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The attribute name in the trace to be used for reading the field value. Applicable for trace source.",
               "title": "Attribute Name"
            },
            "json_path": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The json path to be used for reading the field value from the attribute value. Applicable for trace source. If not provided, the span attribute value is read as the field value.",
               "title": "Json Path"
            },
            "lookup_child_spans": {
               "anyOf": [
                  {
                     "type": "boolean"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": false,
               "description": "The flag to indicate if all the child spans should be searched for the attribute value. Applicable for trace source.",
               "title": "Look up child spans"
            }
         },
         "required": [
            "name",
            "type"
         ],
         "title": "MappingItem",
         "type": "object"
      },
      "MetricMapping": {
         "description": "The metric mapping data",
         "properties": {
            "name": {
               "description": "The name of the metric.",
               "title": "Name",
               "type": "string"
            },
            "method": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The method used to compute the metric.",
               "title": "Method"
            },
            "applies_to": {
               "title": "Applies To",
               "type": "string"
            },
            "mapping": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/Mapping"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The data mapping details for the metric which are used to read the values needed to compute the metric.",
               "title": "Mapping"
            }
         },
         "required": [
            "name"
         ],
         "title": "MetricMapping",
         "type": "object"
      }
   },
   "required": [
      "message_id",
      "metric_mappings",
      "data"
   ]
}

Fields:

data (dict)
message_id (str)
metric_mappings (list[ibm_watsonx_gov.entities.evaluation_result.MetricMapping])

field data: generated_text': 'The response'}}}])] [Required]¶: The span data used for metrics computation.

field message_id: Annotated[str, FieldInfo(annotation=NoneType, required=True, title='Message ID', description='The ID of the message.')] [Required]¶: The ID of the message.

field metric_mappings: Annotated[list[MetricMapping], FieldInfo(annotation=NoneType, required=True, title='Metric Mapping', description='The list of metric mappings.')] [Required]¶: The list of metric mappings.

pydantic model ibm_watsonx_gov.entities.evaluation_result.NodeData¶

Bases: BaseModel

The model class to capture the node input output data of a langgraph agent.

Show JSON schema

{
   "title": "NodeData",
   "description": "The model class to capture the node input output data of a langgraph agent.",
   "type": "object",
   "properties": {
      "message_id": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "description": "The ID of the message.",
         "title": "Message ID"
      },
      "message_ts": {
         "anyOf": [
            {
               "format": "date-time",
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "description": "The timestamp of the message in ISO format. The end timestamp of the message processing is considered as the message timestamp.",
         "title": "Message timestamp"
      },
      "conversation_id": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "description": "The ID of the conversation containing the message.",
         "title": "Conversation ID"
      },
      "node_name": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "description": "The name of the node.",
         "title": "Node name"
      },
      "start_time": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "description": "The node execution start time in ISO format.",
         "title": "Start time"
      },
      "end_time": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "description": "The node execution end time in ISO format.",
         "title": "End time"
      },
      "input": {
         "anyOf": [
            {
               "type": "object"
            },
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "description": "The node input data.",
         "title": "Input"
      },
      "output": {
         "anyOf": [
            {
               "type": "object"
            },
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "description": "The node output data.",
         "title": "Input"
      },
      "execution_order": {
         "default": 0,
         "description": "The execution order of the node in the langgraph.",
         "title": "Execution Order",
         "type": "integer"
      },
      "execution_count": {
         "default": 0,
         "description": "The execution count of the node in the langgraph.",
         "title": "Execution Count",
         "type": "integer"
      },
      "node_txn_id": {
         "default": "94a5377c-af46-45bd-b4ea-1a61b1d7e126",
         "description": "Unique identifier of the object.",
         "title": "Node transaction id",
         "type": "string"
      },
      "node_txn_timestamp": {
         "description": "The node transaction timestamp. The end timestamp of the node execution is considered as the node transaction timestamp.",
         "title": "Node transaction timestamp",
         "type": "string"
      }
   },
   "required": [
      "message_id",
      "message_ts",
      "conversation_id",
      "node_name",
      "start_time",
      "end_time",
      "input",
      "output",
      "node_txn_timestamp"
   ]
}

Fields:

conversation_id (str | None)
end_time (str | None)
execution_count (int)
execution_order (int)
input (dict | str | None)
message_id (str | None)
message_ts (datetime.datetime | None)
node_name (str | None)
node_txn_id (str)
node_txn_timestamp (str)
output (dict | str | None)
start_time (str | None)

field conversation_id: Annotated[str | None, FieldInfo(annotation=NoneType, required=True, title='Conversation ID', description='The ID of the conversation containing the message.')] [Required]¶: The ID of the conversation containing the message.

field end_time: Annotated[str | None, FieldInfo(annotation=NoneType, required=True, title='End time', description='The node execution end time in ISO format.')] [Required]¶: The node execution end time in ISO format.

field execution_count: Annotated[int, FieldInfo(annotation=NoneType, required=False, default=0, title='Execution Count', description='The execution count of the node in the langgraph.')] = 0¶: The execution count of the node in the langgraph.

field execution_order: Annotated[int, FieldInfo(annotation=NoneType, required=False, default=0, title='Execution Order', description='The execution order of the node in the langgraph.')] = 0¶: The execution order of the node in the langgraph.

field input: Annotated[dict | str | None, FieldInfo(annotation=NoneType, required=True, title='Input', description='The node input data.')] [Required]¶: The node input data.

field message_id: Annotated[str | None, FieldInfo(annotation=NoneType, required=True, title='Message ID', description='The ID of the message.')] [Required]¶: The ID of the message.

field message_ts: Annotated[datetime | None, FieldInfo(annotation=NoneType, required=True, title='Message timestamp', description='The timestamp of the message in ISO format. The end timestamp of the message processing is considered as the message timestamp.')] [Required]¶: The timestamp of the message in ISO format. The end timestamp of the message processing is considered as the message timestamp.

field node_name: Annotated[str | None, FieldInfo(annotation=NoneType, required=True, title='Node name', description='The name of the node.')] [Required]¶: The name of the node.

field node_txn_id: Annotated[str, FieldInfo(annotation=NoneType, required=False, default='94a5377c-af46-45bd-b4ea-1a61b1d7e126', title='Node transaction id', description='Unique identifier of the object.')] = '94a5377c-af46-45bd-b4ea-1a61b1d7e126'¶: Unique identifier of the object.

field node_txn_timestamp: Annotated[str, FieldInfo(annotation=NoneType, required=True, title='Node transaction timestamp', description='The node transaction timestamp. The end timestamp of the node execution is considered as the node transaction timestamp.')] [Required]¶: The node transaction timestamp. The end timestamp of the node execution is considered as the node transaction timestamp.

field output: Annotated[dict | str | None, FieldInfo(annotation=NoneType, required=True, title='Input', description='The node output data.')] [Required]¶: The node output data.

field start_time: Annotated[str | None, FieldInfo(annotation=NoneType, required=True, title='Start time', description='The node execution start time in ISO format.')] [Required]¶: The node execution start time in ISO format.

pydantic model ibm_watsonx_gov.entities.evaluation_result.RecordMetricResult¶

Bases: BaseMetricResult

Show JSON schema

{
   "title": "RecordMetricResult",
   "type": "object",
   "properties": {
      "name": {
         "description": "The name of the metric.",
         "examples": [
            "answer_relevance",
            "context_relevance"
         ],
         "title": "Metric Name",
         "type": "string"
      },
      "display_name": {
         "description": "The display name of the metric.",
         "examples": [
            "Answer Relevance",
            "Context Relevance"
         ],
         "title": "Metric display name",
         "type": "string"
      },
      "value_type": {
         "default": "numeric",
         "description": "The type of the metric value. Indicates whether the metric value is numeric or categorical.",
         "examples": [
            "numeric",
            "categorical"
         ],
         "title": "Metric value type",
         "type": "string"
      },
      "method": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The method used to compute this metric result.",
         "examples": [
            "token_recall"
         ],
         "title": "Method"
      },
      "provider": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The provider used to compute this metric result.",
         "title": "Provider"
      },
      "value": {
         "anyOf": [
            {
               "type": "number"
            },
            {
               "type": "string"
            },
            {
               "type": "boolean"
            },
            {
               "additionalProperties": {
                  "type": "integer"
               },
               "type": "object"
            },
            {
               "type": "null"
            }
         ],
         "description": "The metric value.",
         "title": "Value"
      },
      "label": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The string equivalent of the metric value. This is used for metrics with categorical value type.",
         "title": "Label"
      },
      "errors": {
         "anyOf": [
            {
               "items": {
                  "$ref": "#/$defs/Error"
               },
               "type": "array"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The list of error messages",
         "title": "Errors"
      },
      "additional_info": {
         "anyOf": [
            {
               "type": "object"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The additional information about the metric result.",
         "title": "Additional Info"
      },
      "explanation": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The explanation about the metric result.",
         "title": "Explanation"
      },
      "group": {
         "anyOf": [
            {
               "$ref": "#/$defs/MetricGroup"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The metric group",
         "title": "Group"
      },
      "thresholds": {
         "default": [],
         "description": "The metric thresholds",
         "items": {
            "$ref": "#/$defs/MetricThreshold"
         },
         "title": "Thresholds",
         "type": "array"
      },
      "record_id": {
         "description": "The record identifier.",
         "examples": [
            "record1"
         ],
         "title": "Record Id",
         "type": "string"
      },
      "record_timestamp": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The record timestamp.",
         "examples": [
            "2025-01-01T00:00:00.000000Z"
         ],
         "title": "Record Timestamp"
      }
   },
   "$defs": {
      "Error": {
         "properties": {
            "code": {
               "description": "The error code",
               "title": "Code",
               "type": "string"
            },
            "message_en": {
               "description": "The error message in English.",
               "title": "Message En",
               "type": "string"
            },
            "parameters": {
               "default": [],
               "description": "The list of parameters to construct the message in a different locale.",
               "items": {},
               "title": "Parameters",
               "type": "array"
            }
         },
         "required": [
            "code",
            "message_en"
         ],
         "title": "Error",
         "type": "object"
      },
      "MetricGroup": {
         "enum": [
            "retrieval_quality",
            "answer_quality",
            "content_safety",
            "performance",
            "usage",
            "message_completion",
            "tool_call_quality",
            "readability",
            "custom"
         ],
         "title": "MetricGroup",
         "type": "string"
      },
      "MetricThreshold": {
         "description": "The class that defines the threshold for a metric.",
         "properties": {
            "type": {
               "description": "Threshold type. One of 'lower_limit', 'upper_limit'",
               "enum": [
                  "lower_limit",
                  "upper_limit"
               ],
               "title": "Type",
               "type": "string"
            },
            "value": {
               "default": 0,
               "description": "The value of metric threshold",
               "title": "Threshold value",
               "type": "number"
            }
         },
         "required": [
            "type"
         ],
         "title": "MetricThreshold",
         "type": "object"
      }
   },
   "required": [
      "name",
      "display_name",
      "value",
      "record_id"
   ]
}

Config:

arbitrary_types_allowed: bool = True
use_enum_values: bool = True

Fields:

record_id (Annotated[str, FieldInfo(annotation=NoneType, required=True, description='The record identifier.', examples=['record1'])])
record_timestamp (Annotated[str | None, FieldInfo(annotation=NoneType, required=False, default=None, description='The record timestamp.', examples=['2025-01-01T00:00:00.000000Z'])])

field record_id: Annotated[str, FieldInfo(annotation=NoneType, required=True, description='The record identifier.', examples=['record1'])] [Required]¶: The record identifier.

field record_timestamp: Annotated[str | None, FieldInfo(annotation=NoneType, required=False, default=None, description='The record timestamp.', examples=['2025-01-01T00:00:00.000000Z'])] = None¶: The record timestamp.

pydantic model ibm_watsonx_gov.entities.evaluation_result.ToolMetricResult¶

Bases: RecordMetricResult

Show JSON schema

{
   "title": "ToolMetricResult",
   "type": "object",
   "properties": {
      "name": {
         "description": "The name of the metric.",
         "examples": [
            "answer_relevance",
            "context_relevance"
         ],
         "title": "Metric Name",
         "type": "string"
      },
      "display_name": {
         "description": "The display name of the metric.",
         "examples": [
            "Answer Relevance",
            "Context Relevance"
         ],
         "title": "Metric display name",
         "type": "string"
      },
      "value_type": {
         "default": "numeric",
         "description": "The type of the metric value. Indicates whether the metric value is numeric or categorical.",
         "examples": [
            "numeric",
            "categorical"
         ],
         "title": "Metric value type",
         "type": "string"
      },
      "method": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The method used to compute this metric result.",
         "examples": [
            "token_recall"
         ],
         "title": "Method"
      },
      "provider": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The provider used to compute this metric result.",
         "title": "Provider"
      },
      "value": {
         "anyOf": [
            {
               "type": "number"
            },
            {
               "type": "string"
            },
            {
               "type": "boolean"
            },
            {
               "additionalProperties": {
                  "type": "integer"
               },
               "type": "object"
            },
            {
               "type": "null"
            }
         ],
         "description": "The metric value.",
         "title": "Value"
      },
      "label": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The string equivalent of the metric value. This is used for metrics with categorical value type.",
         "title": "Label"
      },
      "errors": {
         "anyOf": [
            {
               "items": {
                  "$ref": "#/$defs/Error"
               },
               "type": "array"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The list of error messages",
         "title": "Errors"
      },
      "additional_info": {
         "anyOf": [
            {
               "type": "object"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The additional information about the metric result.",
         "title": "Additional Info"
      },
      "explanation": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The explanation about the metric result.",
         "title": "Explanation"
      },
      "group": {
         "anyOf": [
            {
               "$ref": "#/$defs/MetricGroup"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The metric group",
         "title": "Group"
      },
      "thresholds": {
         "default": [],
         "description": "The metric thresholds",
         "items": {
            "$ref": "#/$defs/MetricThreshold"
         },
         "title": "Thresholds",
         "type": "array"
      },
      "record_id": {
         "description": "The record identifier.",
         "examples": [
            "record1"
         ],
         "title": "Record Id",
         "type": "string"
      },
      "record_timestamp": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The record timestamp.",
         "examples": [
            "2025-01-01T00:00:00.000000Z"
         ],
         "title": "Record Timestamp"
      },
      "tool_name": {
         "description": "Name of the tool for which this result is computed.",
         "title": "Tool Name",
         "type": "string"
      },
      "execution_count": {
         "default": 1,
         "description": "The execution count for this tool name.",
         "exclusiveMinimum": 0,
         "title": "Execution count",
         "type": "integer"
      }
   },
   "$defs": {
      "Error": {
         "properties": {
            "code": {
               "description": "The error code",
               "title": "Code",
               "type": "string"
            },
            "message_en": {
               "description": "The error message in English.",
               "title": "Message En",
               "type": "string"
            },
            "parameters": {
               "default": [],
               "description": "The list of parameters to construct the message in a different locale.",
               "items": {},
               "title": "Parameters",
               "type": "array"
            }
         },
         "required": [
            "code",
            "message_en"
         ],
         "title": "Error",
         "type": "object"
      },
      "MetricGroup": {
         "enum": [
            "retrieval_quality",
            "answer_quality",
            "content_safety",
            "performance",
            "usage",
            "message_completion",
            "tool_call_quality",
            "readability",
            "custom"
         ],
         "title": "MetricGroup",
         "type": "string"
      },
      "MetricThreshold": {
         "description": "The class that defines the threshold for a metric.",
         "properties": {
            "type": {
               "description": "Threshold type. One of 'lower_limit', 'upper_limit'",
               "enum": [
                  "lower_limit",
                  "upper_limit"
               ],
               "title": "Type",
               "type": "string"
            },
            "value": {
               "default": 0,
               "description": "The value of metric threshold",
               "title": "Threshold value",
               "type": "number"
            }
         },
         "required": [
            "type"
         ],
         "title": "MetricThreshold",
         "type": "object"
      }
   },
   "required": [
      "name",
      "display_name",
      "value",
      "record_id",
      "tool_name"
   ]
}

Config:

arbitrary_types_allowed: bool = True
use_enum_values: bool = True

Fields:

execution_count (Annotated[int, FieldInfo(annotation=NoneType, required=False, default=1, title='Execution count', description='The execution count for this tool name.', metadata=[Gt(gt=0)])])
tool_name (Annotated[str, FieldInfo(annotation=NoneType, required=True, title='Tool Name', description='Name of the tool for which this result is computed.')])

field execution_count: Annotated[int, FieldInfo(annotation=NoneType, required=False, default=1, title='Execution count', description='The execution count for this tool name.', metadata=[Gt(gt=0)])] = 1¶

The execution count for this tool name.

Constraints:

gt = 0

field tool_name: Annotated[str, FieldInfo(annotation=NoneType, required=True, title='Tool Name', description='Name of the tool for which this result is computed.')] [Required]¶: Name of the tool for which this result is computed.