Agentic Evaluator¶

pydantic model ibm_watsonx_gov.evaluators.agentic_evaluator.AgenticEvaluator¶

Bases: BaseEvaluator

The class to evaluate agentic application.

Examples

Evaluate Agent with default parameters. This will compute only the performance(latency, duration) and usage(cost, input_token_count, output_token_count) metrics.

agentic_evaluator = AgenticEvaluator()
agentic_evaluator.start_run()
# Invoke the agentic application
agentic_evaluator.end_run()
result = agentic_evaluator.get_result()

Evaluate Agent by specifying the agent or message level metrics and the node level metrics which will be computed post graph invocation when end_run() is called.

# Below example provides the node configuration to compute the ContextRelevanceMetric and all the Retrieval Quality group metrics.
nodes = [Node(name="Retrieval Node",
            metrics_configurations=[MetricsConfiguration(metrics=[ContextRelevanceMetric()],
                                                         metric_groups=[MetricGroup.RETRIEVAL_QUALITY])])]
# Please refer to MetricsConfiguration class for advanced usage where the fields details can be specified, in case the graph state has the attributes with non default names.

# Below example provides the agent configuration to compute the AnswerRelevanceMetric and all the Content Safety group metrics on agent or message level.
agentic_app = AgenticApp(name="Agentic App",
                    metrics_configuration=MetricsConfiguration(metrics=[AnswerRelevanceMetric()],
                                                                metric_groups=[MetricGroup.CONTENT_SAFETY]),
                    nodes=nodes)

agentic_evaluator = AgenticEvaluator(agentic_app=agentic_app)
agentic_evaluator.start_run()
# Invoke the agentic application
agentic_evaluator.end_run()
result = agentic_evaluator.get_result()

Evaluate Agent by specifying the agent or message level metrics and use decorator to compute node level metrics which will be computed during graph invocation.

# Below example provides the agent configuration to compute the AnswerRelevanceMetric and all the Content Safety group metrics on agent or message level.
# Agent or message level metrics will be computed post graph invocation when end_run() is called.
agentic_app = AgenticApp(name="Agentic App",
                    metrics_configuration=MetricsConfiguration(metrics=[AnswerRelevanceMetric()],
                                                                metric_groups=[MetricGroup.CONTENT_SAFETY]))

agentic_evaluator = AgenticEvaluator(agentic_app=agentic_app)

# Add decorator when defining the node functions
@evaluator.evaluate_retrieval_quality(configuration=AgenticAIConfiguration(**{"input_fields": ["input_text"], "context_fields": ["local_context"]}))
@evaluator.evaluate_content_safety() # Here the default AgenticAIConfiguration is used
def local_search_node(state: GraphState, config: RunnableConfig) -> dict:
    # Retrieve data from vector db
    # ...
    return {"local_context": []}

agentic_evaluator.start_run()
# Invoke the agentic application
agentic_evaluator.end_run()
result = agentic_evaluator.get_result()

Evaluate agent with experiment tracking

tracing_config = TracingConfiguration(project_id=project_id)
agentic_evaluator = AgenticEvaluator(tracing_configuration=tracing_config)

agentic_evaluator.track_experiment(name="my_experiment")
agentic_evaluator.start_run(AIExperimentRunRequest(name="run1"))
# Invoke the agentic application
agentic_evaluator.end_run()
result = agentic_evaluator.get_result()

Show JSON schema

{
   "title": "AgenticEvaluator",
   "type": "object",
   "properties": {
      "api_client": {
         "default": null,
         "title": "Api Client"
      },
      "agentic_app": {
         "anyOf": [
            {
               "$ref": "#/$defs/AgenticApp"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The agentic application configuration details.",
         "title": "Agentic application configuration details"
      },
      "tracing_configuration": {
         "anyOf": [
            {
               "$ref": "#/$defs/TracingConfiguration"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The tracing configuration details.",
         "title": "Tracing Configuration"
      },
      "ai_experiment_client": {
         "default": null,
         "title": "Ai Experiment Client"
      },
      "max_concurrency": {
         "default": 10,
         "description": "The maximum concurrency to use for evaluating metrics.",
         "title": "Max Concurrency",
         "type": "integer"
      }
   },
   "$defs": {
      "AWSBedrockCredentials": {
         "description": "Defines the AWSBedrockCredentials class for accessing AWS Bedrock using environment variables or manual input.\n\nExamples:\n    1. Create credentials manually:\n        .. code-block:: python\n\n            credentials = AWSBedrockCredentials(\n                aws_access_key_id=\"your-access-key-id\",\n                aws_secret_access_key=\"your-secret-access-key\",\n                aws_region_name=\"us-east-1\",\n                aws_session_token=\"optional-session-token\"\n            )\n\n    2. Create credentials from environment:\n        .. code-block:: python\n\n            os.environ[\"AWS_ACCESS_KEY_ID\"] = \"your-access-key-id\"\n            os.environ[\"AWS_DEFAULT_REGION\"] = \"us-east-1\"\n            os.environ[\"AWS_SECRET_ACCESS_KEY\"] = \"your-secret-access-key\"\n\n            credentials = AWSBedrockCredentials.create_from_env()",
         "properties": {
            "aws_access_key_id": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "description": "The AWS access key id. This attribute value will be read from AWS_ACCESS_KEY_ID environment variable when creating AWSBedrockCredentials from environment.",
               "title": "AWS Access Key ID"
            },
            "aws_secret_access_key": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "description": "The AWS secret access key. This attribute value will be read from AWS_SECRET_ACCESS_KEY environment variable when creating AWSBedrockCredentials from environment.",
               "title": "AWS Secret Access Key"
            },
            "aws_region_name": {
               "default": "us-east-1",
               "description": "AWS region. This attribute value will be read from AWS_DEFAULT_REGION environment variable when creating AWSBedrockCredentials from environment.",
               "examples": [
                  "us-east-1",
                  "eu-west-1"
               ],
               "title": "AWS Region",
               "type": "string"
            },
            "aws_session_token": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "description": "Optional AWS session token for temporary credentials.",
               "title": "AWS Session Token"
            }
         },
         "required": [
            "aws_access_key_id",
            "aws_secret_access_key",
            "aws_session_token"
         ],
         "title": "AWSBedrockCredentials",
         "type": "object"
      },
      "AWSBedrockFoundationModel": {
         "description": "    The Amazon Bedrock foundation model details.\n\n    Examples:\n        1. Create AWS Bedrock foundation model by passing credentials manually:\n            .. code-block:: python\n\n                bedrock_model = AWSBedrockFoundationModel(\n                    model_id=\"anthropic.claude-v2\",\n                    provider=AWSBedrockModelProvider(\n                        credentials=AWSBedrockCredentials(\n                            aws_access_key_id=\"your-access-key-id\",\n                            aws_secret_access_key=\"your-secret-access-key\",\n                            aws_region_name=\"us-east-1\",\n                            aws_session_token=\"optional-session-token\"\n                        )\n                    ),\n                    parameters={\n                        \"temperature\": 0.7,\n                        \"top_p\": 0.9,\n                        \"max_tokens\": 200,\n                        \"stop_sequences\": [\"\n\"],\n                        \"system\": \"You are a concise assistant.\",\n                        \"reasoning_effort\": \"high\",\n                        \"tool_choice\": \"auto\"\n                    }\n                )\n\n        2. Create AWS Bedrock foundation model using environment variables:\n            os.environ[\"AWS_ACCESS_KEY_ID\"] = \"your-access-key-id\"\n            os.environ[\"AWS_SECRET_ACCESS_KEY\"] = \"your-secret-access-key\"\n            os.environ[\"AWS_DEFAULT_REGION\"] = \"us-east-1\"\n\n            .. code-block:: python\n\n                bedrock_model = AWSBedrockFoundationModel(\n                    model_id=\"anthropic.claude-v2\"\n                )\n    ",
         "properties": {
            "model_id": {
               "description": "The AWS Bedrock model name. It must be a valid AWS Bedrock model identifier.",
               "examples": [
                  "anthropic.claude-v2"
               ],
               "title": "Model ID",
               "type": "string"
            },
            "provider": {
               "$ref": "#/$defs/AWSBedrockModelProvider",
               "description": "The AWS Bedrock provider details.",
               "title": "Provider"
            },
            "parameters": {
               "anyOf": [
                  {
                     "type": "object"
                  },
                  {
                     "type": "null"
                  }
               ],
               "description": "The model parameters to be used when invoking the model. The parameters may include temperature, top_p, max_tokens, etc..",
               "title": "Parameters"
            }
         },
         "required": [
            "model_id"
         ],
         "title": "AWSBedrockFoundationModel",
         "type": "object"
      },
      "AWSBedrockModelProvider": {
         "description": "Represents a model provider using Amazon Bedrock.\n\nExamples:\n    1. Create provider using credentials object:\n        .. code-block:: python\n\n            provider = AWSBedrockModelProvider(\n                credentials=AWSBedrockCredentials(\n                    aws_access_key_id=\"your-access-key-id\",\n                    aws_secret_access_key=\"your-secret-access-key\",\n                    aws_region_name=\"us-east-1\",\n                    aws_session_token=\"optional-session-token\"\n                )\n            )\n\n    2. Create provider using environment variables:\n        .. code-block:: python\n\n            os.environ['AWS_ACCESS_KEY_ID'] = \"your-access-key-id\"\n            os.environ['AWS_SECRET_ACCESS_KEY'] = \"your-secret-access-key\"\n            os.environ['AWS_SESSION_TOKEN'] = \"optional-session-token\"  # Optional\n            os.environ['AWS_DEFAULT_REGION'] = \"us-east-1\"\n            provider = AWSBedrockModelProvider()",
         "properties": {
            "type": {
               "$ref": "#/$defs/ModelProviderType",
               "default": "aws_bedrock",
               "description": "The type of model provider."
            },
            "credentials": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/AWSBedrockCredentials"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "AWS Bedrock credentials."
            }
         },
         "title": "AWSBedrockModelProvider",
         "type": "object"
      },
      "AgenticAIConfiguration": {
         "description": "Defines the AgenticAIConfiguration class.\n\nThe configuration interface for Agentic AI tools and applications.\nThis is used to specify the fields mapping details in the data and other configuration parameters needed for evaluation.\n\nExamples:\n    1. Create configuration with default parameters\n        .. code-block:: python\n\n            configuration = AgenticAIConfiguration()\n\n    2. Create configuration with parameters\n        .. code-block:: python\n\n            configuration = AgenticAIConfiguration(input_fields=[\"input\"], \n                                                   output_fields=[\"output\"])\n\n    2. Create configuration with dict parameters\n        .. code-block:: python\n\n            config = {\"input_fields\": [\"input\"],\n                      \"output_fields\": [\"output\"],\n                      \"context_fields\": [\"contexts\"],\n                      \"reference_fields\": [\"reference\"]}\n            configuration = AgenticAIConfiguration(**config)",
         "properties": {
            "record_id_field": {
               "default": "record_id",
               "description": "The record identifier field name.",
               "examples": [
                  "record_id"
               ],
               "title": "Record id field",
               "type": "string"
            },
            "record_timestamp_field": {
               "default": "record_timestamp",
               "description": "The record timestamp field name.",
               "examples": [
                  "record_timestamp"
               ],
               "title": "Record timestamp field",
               "type": "string"
            },
            "task_type": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/TaskType"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The generative task type. Default value is None.",
               "examples": [
                  "retrieval_augmented_generation"
               ],
               "title": "Task Type"
            },
            "input_fields": {
               "default": [
                  "input_text"
               ],
               "description": "The list of model input fields in the data. Default value is ['input_text'].",
               "examples": [
                  [
                     "question"
                  ]
               ],
               "items": {
                  "type": "string"
               },
               "title": "Input Fields",
               "type": "array"
            },
            "context_fields": {
               "default": [
                  "context"
               ],
               "description": "The list of context fields in the input fields. Default value is ['context'].",
               "examples": [
                  [
                     "context1",
                     "context2"
                  ]
               ],
               "items": {
                  "type": "string"
               },
               "title": "Context Fields",
               "type": "array"
            },
            "output_fields": {
               "default": [
                  "generated_text"
               ],
               "description": "The list of model output fields in the data. Default value is ['generated_text'].",
               "examples": [
                  [
                     "output"
                  ]
               ],
               "items": {
                  "type": "string"
               },
               "title": "Output Fields",
               "type": "array"
            },
            "reference_fields": {
               "default": [
                  "ground_truth"
               ],
               "description": "The list of reference fields in the data. Default value is ['ground_truth'].",
               "examples": [
                  [
                     "reference"
                  ]
               ],
               "items": {
                  "type": "string"
               },
               "title": "Reference Fields",
               "type": "array"
            },
            "locale": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/Locale"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The language locale of the input, output and reference fields in the data.",
               "title": "Locale"
            },
            "tools": {
               "default": [],
               "description": "The list of tools used by the LLM.",
               "examples": [
                  [
                     "function1",
                     "function2"
                  ]
               ],
               "items": {
                  "type": "object"
               },
               "title": "Tools",
               "type": "array"
            },
            "tool_calls_field": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": "tool_calls",
               "description": "The tool calls field in the input fields. Default value is 'tool_calls'.",
               "examples": [
                  "tool_calls"
               ],
               "title": "Tool Calls Field"
            },
            "available_tools_field": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": "available_tools",
               "description": "The tool inventory field in the data. Default value is 'available_tools'.",
               "examples": [
                  "available_tools"
               ],
               "title": "Available Tools Field"
            },
            "llm_judge": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/LLMJudge"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "LLM as Judge Model details.",
               "title": "LLM Judge"
            },
            "prompt_field": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": "model_prompt",
               "description": "The prompt field in the input fields. Default value is 'model_prompt'.",
               "examples": [
                  "model_prompt"
               ],
               "title": "Model Prompt Field"
            },
            "message_id_field": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": "message_id",
               "description": "The message identifier field name. Default value is 'message_id'.",
               "examples": [
                  "message_id"
               ],
               "title": "Message id field"
            },
            "conversation_id_field": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": "conversation_id",
               "description": "The conversation identifier field name. Default value is 'conversation_id'.",
               "examples": [
                  "conversation_id"
               ],
               "title": "Conversation id field"
            }
         },
         "title": "AgenticAIConfiguration",
         "type": "object"
      },
      "AgenticApp": {
         "description": "The configuration class representing an agentic application.\nAn agent is composed of a set of nodes.\nThe metrics to be computed at the agent or message level should be specified in the metrics_configuration and the metrics to be computed for the node level should be specified in the nodes list.\n\nExamples:\n    1. Create AgenticApp with agent level metrics configuration. \n        .. code-block:: python\n\n            # Below example provides the agent configuration to compute the AnswerRelevanceMetric and all the metrics in Content Safety group on agent or message level.\n            agentic_app = AgenticApp(name=\"Agentic App\",\n                                metrics_configuration=MetricsConfiguration(metrics=[AnswerRelevanceMetric()],\n                                                                            metric_groups=[MetricGroup.CONTENT_SAFETY]))\n            agentic_evaluator = AgenticEvaluator(agentic_app=agentic_app)\n            ...\n\n    2. Create AgenticApp with agent and node level metrics configuration and default agentic ai configuration for metrics. \n        .. code-block:: python\n\n            # Below example provides the node configuration to compute the ContextRelevanceMetric and all the metrics in Retrieval Quality group. \n            nodes = [Node(name=\"Retrieval Node\",\n                        metrics_configurations=[MetricsConfiguration(metrics=[ContextRelevanceMetric()],\n                                                                     metric_groups=[MetricGroup.RETRIEVAL_QUALITY])])]\n\n            # Below example provides the agent configuration to compute the AnswerRelevanceMetric and all the metrics in Content Safety group on agent or message level.\n            agentic_app = AgenticApp(name=\"Agentic App\",\n                                metrics_configuration=MetricsConfiguration(metrics=[AnswerRelevanceMetric()],\n                                                                            metric_groups=[MetricGroup.CONTENT_SAFETY]),\n                                nodes=nodes)\n            agentic_evaluator = AgenticEvaluator(agentic_app=agentic_app)\n            ...\n\n    3. Create AgenticApp with agent and nodel level metrics configuration and with agentic ai configuration for metrics. \n        .. code-block:: python\n\n            # Below example provides the node configuration to compute the ContextRelevanceMetric and all the metrics in Retrieval Quality group.\n            node_fields_config = {\n                \"input_fields\": [\"input\"],\n                \"context_fields\": [\"web_context\"]\n            }\n            nodes = [Node(name=\"Retrieval Node\",\n                        metrics_configurations=[MetricsConfiguration(configuration=AgenticAIConfiguration(**node_fields_config)\n                                                                     metrics=[ContextRelevanceMetric()],\n                                                                     metric_groups=[MetricGroup.RETRIEVAL_QUALITY])])]\n\n            # Below example provides the agent configuration to compute the AnswerRelevanceMetric and all the metrics in Content Safety group on agent or message level.\n            agent_fields_config = {\n                \"input_fields\": [\"input\"],\n                \"output_fields\": [\"output\"]\n            }\n            agentic_app = AgenticApp(name=\"Agentic App\",\n                                metrics_configuration=MetricsConfiguration(configuration=AgenticAIConfiguration(**agent_fields_config)\n                                                                           metrics=[AnswerRelevanceMetric()],\n                                                                           metric_groups=[MetricGroup.CONTENT_SAFETY]),\n                                nodes=nodes)\n            agentic_evaluator = AgenticEvaluator(agentic_app=agentic_app)\n            ...",
         "properties": {
            "name": {
               "default": "Agentic App",
               "description": "The name of the agentic application.",
               "title": "Agentic application name",
               "type": "string"
            },
            "metrics_configuration": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/MetricsConfiguration"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": {
                  "configuration": {
                     "available_tools_field": "available_tools",
                     "context_fields": [
                        "context"
                     ],
                     "conversation_id_field": "conversation_id",
                     "input_fields": [
                        "input_text"
                     ],
                     "llm_judge": null,
                     "locale": null,
                     "message_id_field": "message_id",
                     "output_fields": [
                        "generated_text"
                     ],
                     "prompt_field": "model_prompt",
                     "record_id_field": "record_id",
                     "record_timestamp_field": "record_timestamp",
                     "reference_fields": [
                        "ground_truth"
                     ],
                     "task_type": null,
                     "tool_calls_field": "tool_calls",
                     "tools": []
                  },
                  "metrics": [],
                  "metric_groups": []
               },
               "description": "The list of metrics to be computed on the agentic application and their configuration details.",
               "title": "Metrics configuration"
            },
            "nodes": {
               "anyOf": [
                  {
                     "items": {
                        "$ref": "#/$defs/Node"
                     },
                     "type": "array"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": [],
               "description": "The nodes details.",
               "title": "Node details"
            }
         },
         "title": "AgenticApp",
         "type": "object"
      },
      "AzureOpenAICredentials": {
         "properties": {
            "url": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "description": "Azure OpenAI url. This attribute can be read from `AZURE_OPENAI_HOST` environment variable.",
               "title": "Url"
            },
            "api_key": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "description": "API key for Azure OpenAI. This attribute can be read from `AZURE_OPENAI_API_KEY` environment variable.",
               "title": "Api Key"
            },
            "api_version": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "description": "The model API version from Azure OpenAI. This attribute can be read from `AZURE_OPENAI_API_VERSION` environment variable.",
               "title": "Api Version"
            }
         },
         "required": [
            "url",
            "api_key",
            "api_version"
         ],
         "title": "AzureOpenAICredentials",
         "type": "object"
      },
      "AzureOpenAIFoundationModel": {
         "description": "The Azure OpenAI foundation model details\n\nExamples:\n    1. Create Azure OpenAI foundation model by passing the credentials during object creation.\n        .. code-block:: python\n\n            azure_openai_foundation_model = AzureOpenAIFoundationModel(\n                model_id=\"gpt-4o-mini\",\n                provider=AzureOpenAIModelProvider(\n                    credentials=AzureOpenAICredentials(\n                        api_key=azure_api_key,\n                        url=azure_host_url,\n                        api_version=azure_api_model_version,\n                    )\n                )\n            )\n\n2. Create Azure OpenAI foundation model by setting the credentials in environment variables:\n    * ``AZURE_OPENAI_API_KEY`` is used to set the api key for OpenAI.\n    * ``AZURE_OPENAI_HOST`` is used to set the url for Azure OpenAI.\n    * ``AZURE_OPENAI_API_VERSION`` is uses to set the the api version for Azure OpenAI.\n\n        .. code-block:: python\n\n            openai_foundation_model = AzureOpenAIFoundationModel(\n                model_id=\"gpt-4o-mini\",\n            )",
         "properties": {
            "model_name": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The name of the foundation model.",
               "title": "Model Name"
            },
            "provider": {
               "$ref": "#/$defs/AzureOpenAIModelProvider",
               "description": "Azure OpenAI provider"
            },
            "model_id": {
               "description": "Model deployment name from Azure OpenAI",
               "title": "Model Id",
               "type": "string"
            }
         },
         "required": [
            "model_id"
         ],
         "title": "AzureOpenAIFoundationModel",
         "type": "object"
      },
      "AzureOpenAIModelProvider": {
         "properties": {
            "type": {
               "$ref": "#/$defs/ModelProviderType",
               "default": "azure_openai",
               "description": "The type of model provider."
            },
            "credentials": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/AzureOpenAICredentials"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Azure OpenAI credentials."
            }
         },
         "title": "AzureOpenAIModelProvider",
         "type": "object"
      },
      "CustomFoundationModel": {
         "description": "Defines the CustomFoundationModel class.\n\nThis class extends the base `FoundationModel` to support custom inference logic through a user-defined scoring function.\nIt is intended for use cases where the model is externally hosted and not in the list of supported frameworks.\nExamples:\n    1. Define a custom scoring function and create a model:\n        .. code-block:: python\n\n            import pandas as pd\n\n            def scoring_fn(data: pd.DataFrame):\n                predictions_list = []\n                # Custom logic to call an external LLM\n                return pd.DataFrame({\"generated_text\": predictions_list})                    \n\n            model = CustomFoundationModel(\n                scoring_fn=scoring_fn\n            )",
         "properties": {
            "model_name": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The name of the foundation model.",
               "title": "Model Name"
            },
            "provider": {
               "$ref": "#/$defs/ModelProvider",
               "description": "The provider of the model."
            }
         },
         "title": "CustomFoundationModel",
         "type": "object"
      },
      "FoundationModelInfo": {
         "description": "Represents a foundation model used in an experiment.",
         "properties": {
            "model_name": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The name of the foundation model.",
               "title": "Model Name"
            },
            "model_id": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The id of the foundation model.",
               "title": "Model Id"
            },
            "provider": {
               "description": "The provider of the foundation model.",
               "title": "Provider",
               "type": "string"
            },
            "type": {
               "description": "The type of foundation model.",
               "example": [
                  "chat",
                  "embedding",
                  "text-generation"
               ],
               "title": "Type",
               "type": "string"
            }
         },
         "required": [
            "provider",
            "type"
         ],
         "title": "FoundationModelInfo",
         "type": "object"
      },
      "GenAIMetric": {
         "description": "Defines the Generative AI metric interface",
         "properties": {
            "name": {
               "description": "The name of the metric.",
               "examples": [
                  "answer_relevance",
                  "context_relevance"
               ],
               "title": "Metric Name",
               "type": "string"
            },
            "display_name": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The display name of the metric.",
               "examples": [
                  "Answer Relevance",
                  "Context Relevance"
               ],
               "title": "Metric display name"
            },
            "type_": {
               "default": "ootb",
               "description": "The type of the metric. Indicates whether the metric is ootb or custom.",
               "examples": [
                  "ootb",
                  "custom"
               ],
               "title": "Metric type",
               "type": "string"
            },
            "value_type": {
               "default": "numeric",
               "description": "The type of the metric value. Indicates whether the metric value is numeric or categorical.",
               "examples": [
                  "numeric",
                  "categorical"
               ],
               "title": "Metric value type",
               "type": "string"
            },
            "thresholds": {
               "default": [],
               "description": "The list of thresholds",
               "items": {
                  "$ref": "#/$defs/MetricThreshold"
               },
               "title": "Thresholds",
               "type": "array"
            },
            "tasks": {
               "default": [],
               "description": "The task types this metric is associated with.",
               "items": {
                  "$ref": "#/$defs/TaskType"
               },
               "title": "Tasks",
               "type": "array"
            },
            "group": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/MetricGroup"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The metric group this metric belongs to."
            },
            "is_reference_free": {
               "default": true,
               "description": "Decides whether this metric needs a reference for computation",
               "title": "Is Reference Free",
               "type": "boolean"
            },
            "method": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The method used to compute the metric.",
               "title": "Method"
            },
            "metric_dependencies": {
               "default": [],
               "description": "Metrics that needs to be evaluated first",
               "items": {
                  "$ref": "#/$defs/GenAIMetric"
               },
               "title": "Metric Dependencies",
               "type": "array"
            },
            "applies_to": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": "message",
               "description": "The tag to indicate for which the metric is applied to. Used for agentic application metric computation.",
               "examples": [
                  "message",
                  "conversation",
                  "sub_agent"
               ],
               "title": "Applies to"
            },
            "mapping": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/Mapping"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The data mapping details for the metric which are used to read the values needed to compute the metric.",
               "examples": {
                  "items": [
                     {
                        "attribute_name": "traceloop.entity.input",
                        "column_name": null,
                        "json_path": "$.inputs.input_text",
                        "lookup_child_spans": false,
                        "name": "input_text",
                        "span_name": "LangGraph.workflow",
                        "type": "input"
                     },
                     {
                        "attribute_name": "traceloop.entity.output",
                        "column_name": null,
                        "json_path": "$.outputs.generated_text",
                        "lookup_child_spans": false,
                        "name": "generated_text",
                        "span_name": "LangGraph.workflow",
                        "type": "output"
                     }
                  ],
                  "source": "trace"
               },
               "title": "Mapping"
            }
         },
         "required": [
            "name"
         ],
         "title": "GenAIMetric",
         "type": "object"
      },
      "GoogleAIStudioCredentials": {
         "description": "Defines the GoogleAIStudioCredentials class for accessing Google AI Studio using an API key.\n\nExamples:\n    1. Create credentials manually:\n        .. code-block:: python\n\n            google_credentials = GoogleAIStudioCredentials(api_key=\"your-api-key\")\n\n    2. Create credentials from environment:\n        .. code-block:: python\n\n            os.environ[\"GOOGLE_API_KEY\"] = \"your-api-key\"\n            google_credentials = GoogleAIStudioCredentials.create_from_env()",
         "properties": {
            "api_key": {
               "description": "The Google AI Studio key. This attribute can be read from GOOGLE_API_KEY environment variable when creating GoogleAIStudioCredentials from environment.",
               "title": "Api Key",
               "type": "string"
            }
         },
         "required": [
            "api_key"
         ],
         "title": "GoogleAIStudioCredentials",
         "type": "object"
      },
      "GoogleAIStudioFoundationModel": {
         "description": "Represents a foundation model served via Google AI Studio.\n\nExamples:\n    1. Create Google AI Studio foundation model by passing the credentials during object creation.\n        .. code-block:: python\n\n            model = GoogleAIStudioFoundationModel(\n                model_id=\"gemini-1.5-pro-002\",\n                provider=GoogleAIStudioModelProvider(\n                    credentials=GoogleAIStudioCredentials(api_key=\"your_api_key\")\n                )\n            )\n    2. Create Google AI Studio foundation model by setting the credentials in environment variables:\n        * ``GOOGLE_API_KEY`` OR ``GEMINI_API_KEY`` is used to set the Credentials path for Vertex AI.\n            .. code-block:: python\n\n                model = GoogleAIStudioFoundationModel(\n                    model_id=\"gemini/gpt-4o-mini\",\n                )",
         "properties": {
            "model_name": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The name of the foundation model.",
               "title": "Model Name"
            },
            "provider": {
               "$ref": "#/$defs/GoogleAIStudioModelProvider",
               "description": "Google AI Studio provider.",
               "title": "Provider"
            },
            "model_id": {
               "description": "Model name for Google AI Studio. Must be a valid Google AI model identifier or a fully-qualified publisher path",
               "examples": [
                  "gemini-1.5-pro-002"
               ],
               "title": "Model id",
               "type": "string"
            }
         },
         "required": [
            "model_id"
         ],
         "title": "GoogleAIStudioFoundationModel",
         "type": "object"
      },
      "GoogleAIStudioModelProvider": {
         "description": "Represents a model provider using Google AI Studio.\n\nExamples:\n    1. Create provider using credentials object:\n        .. code-block:: python\n\n            provider = GoogleAIStudioModelProvider(\n                credentials=GoogleAIStudioCredentials(api_key=\"api-key\")\n            )\n\n    2. Create provider using environment variables:\n        .. code-block:: python\n\n            os.environ['GOOGLE_API_KEY'] = \"your_api_key\"\n\n            provider = GoogleAIStudioModelProvider()",
         "properties": {
            "type": {
               "$ref": "#/$defs/ModelProviderType",
               "default": "google_ai_studio",
               "description": "The type of model provider."
            },
            "credentials": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/GoogleAIStudioCredentials"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Google AI Studio credentials."
            }
         },
         "title": "GoogleAIStudioModelProvider",
         "type": "object"
      },
      "LLMJudge": {
         "description": "Defines the LLMJudge.\n\nThe LLMJudge class contains the details of the llm judge model to be used for computing the metric.\n\nExamples:\n    1. Create LLMJudge using watsonx.ai foundation model:\n        .. code-block:: python\n\n            wx_ai_foundation_model = WxAIFoundationModel(\n                model_id=\"ibm/granite-3-3-8b-instruct\",\n                project_id=PROJECT_ID,\n                provider=WxAIModelProvider(\n                    credentials=WxAICredentials(api_key=wx_apikey)\n                )\n            )\n            llm_judge = LLMJudge(model=wx_ai_foundation_model)",
         "properties": {
            "model": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/WxAIFoundationModel"
                  },
                  {
                     "$ref": "#/$defs/OpenAIFoundationModel"
                  },
                  {
                     "$ref": "#/$defs/AzureOpenAIFoundationModel"
                  },
                  {
                     "$ref": "#/$defs/PortKeyGateway"
                  },
                  {
                     "$ref": "#/$defs/RITSFoundationModel"
                  },
                  {
                     "$ref": "#/$defs/VertexAIFoundationModel"
                  },
                  {
                     "$ref": "#/$defs/GoogleAIStudioFoundationModel"
                  },
                  {
                     "$ref": "#/$defs/AWSBedrockFoundationModel"
                  },
                  {
                     "$ref": "#/$defs/CustomFoundationModel"
                  }
               ],
               "description": "The foundation model to be used as judge",
               "title": "Model"
            }
         },
         "required": [
            "model"
         ],
         "title": "LLMJudge",
         "type": "object"
      },
      "Locale": {
         "properties": {
            "input": {
               "anyOf": [
                  {
                     "items": {
                        "type": "string"
                     },
                     "type": "array"
                  },
                  {
                     "additionalProperties": {
                        "type": "string"
                     },
                     "type": "object"
                  },
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Input"
            },
            "output": {
               "anyOf": [
                  {
                     "items": {
                        "type": "string"
                     },
                     "type": "array"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Output"
            },
            "reference": {
               "anyOf": [
                  {
                     "items": {
                        "type": "string"
                     },
                     "type": "array"
                  },
                  {
                     "additionalProperties": {
                        "type": "string"
                     },
                     "type": "object"
                  },
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Reference"
            }
         },
         "title": "Locale",
         "type": "object"
      },
      "Mapping": {
         "description": "Defines the field mapping details to be used for computing a metric.",
         "properties": {
            "source": {
               "default": "trace",
               "description": "The source type of the data. Use trace if the data should be read from span in trace. Use tabular if the data is passed as a dataframe.",
               "enum": [
                  "trace",
                  "tabular"
               ],
               "examples": [
                  "trace",
                  "tabular"
               ],
               "title": "Source",
               "type": "string"
            },
            "items": {
               "description": "The list of mapping items for the field. They are used to read the data from trace or tabular data for computing the metric.",
               "items": {
                  "$ref": "#/$defs/MappingItem"
               },
               "title": "Mapping Items",
               "type": "array"
            }
         },
         "required": [
            "items"
         ],
         "title": "Mapping",
         "type": "object"
      },
      "MappingItem": {
         "description": "The mapping details to be used for reading the values from the data.",
         "properties": {
            "name": {
               "description": "The name of the item.",
               "examples": [
                  "input_text",
                  "generated_text",
                  "context",
                  "ground_truth"
               ],
               "title": "Name",
               "type": "string"
            },
            "type": {
               "description": "The type of the item.",
               "enum": [
                  "input",
                  "output",
                  "reference",
                  "context",
                  "tool_call"
               ],
               "examples": [
                  "input"
               ],
               "title": "Type",
               "type": "string"
            },
            "column_name": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The column name in the tabular data to be used for reading the field value. Applicable for tabular source.",
               "title": "Column Name"
            },
            "span_name": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The span name in the trace data to be used for reading the field value. Applicable for trace source.",
               "title": "Span Name"
            },
            "attribute_name": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The attribute name in the trace to be used for reading the field value. Applicable for trace source.",
               "title": "Attribute Name"
            },
            "json_path": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The json path to be used for reading the field value from the attribute value. Applicable for trace source. If not provided, the span attribute value is read as the field value.",
               "title": "Json Path"
            },
            "lookup_child_spans": {
               "anyOf": [
                  {
                     "type": "boolean"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": false,
               "description": "The flag to indicate if all the child spans should be searched for the attribute value. Applicable for trace source.",
               "title": "Look up child spans"
            }
         },
         "required": [
            "name",
            "type"
         ],
         "title": "MappingItem",
         "type": "object"
      },
      "MetricGroup": {
         "enum": [
            "retrieval_quality",
            "answer_quality",
            "content_safety",
            "performance",
            "usage",
            "message_completion",
            "tool_call_quality",
            "readability",
            "custom"
         ],
         "title": "MetricGroup",
         "type": "string"
      },
      "MetricThreshold": {
         "description": "The class that defines the threshold for a metric.",
         "properties": {
            "type": {
               "description": "Threshold type. One of 'lower_limit', 'upper_limit'",
               "enum": [
                  "lower_limit",
                  "upper_limit"
               ],
               "title": "Type",
               "type": "string"
            },
            "value": {
               "default": 0,
               "description": "The value of metric threshold",
               "title": "Threshold value",
               "type": "number"
            }
         },
         "required": [
            "type"
         ],
         "title": "MetricThreshold",
         "type": "object"
      },
      "MetricsConfiguration": {
         "description": "The class representing the metrics to be computed and the configuration details required for them.\n\nExamples:\n    1. Create MetricsConfiguration with default agentic ai configuration\n        .. code-block:: python\n\n            metrics_configuration = MetricsConfiguration(metrics=[ContextRelevanceMetric()],\n                                                         metric_groups=[MetricGroup.RETRIEVAL_QUALITY])])\n\n    2. Create MetricsConfiguration by specifying agentic ai configuration\n        .. code-block:: python\n\n            config = {\n                \"input_fields\": [\"input\"],\n                \"context_fields\": [\"contexts\"]\n            }\n            metrics_configuration = MetricsConfiguration(configuration=AgenticAIConfiguration(**config)\n                                                           metrics=[ContextRelevanceMetric()],\n                                                           metric_groups=[MetricGroup.RETRIEVAL_QUALITY])])",
         "properties": {
            "configuration": {
               "$ref": "#/$defs/AgenticAIConfiguration",
               "default": {
                  "record_id_field": "record_id",
                  "record_timestamp_field": "record_timestamp",
                  "task_type": null,
                  "input_fields": [
                     "input_text"
                  ],
                  "context_fields": [
                     "context"
                  ],
                  "output_fields": [
                     "generated_text"
                  ],
                  "reference_fields": [
                     "ground_truth"
                  ],
                  "locale": null,
                  "tools": [],
                  "tool_calls_field": "tool_calls",
                  "available_tools_field": "available_tools",
                  "llm_judge": null,
                  "prompt_field": "model_prompt",
                  "message_id_field": "message_id",
                  "conversation_id_field": "conversation_id"
               },
               "description": "The configuration of the metrics to compute. The configuration contains the fields names to be read when computing the metrics.",
               "title": "Metrics configuration"
            },
            "metrics": {
               "anyOf": [
                  {
                     "items": {
                        "$ref": "#/$defs/GenAIMetric"
                     },
                     "type": "array"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": [],
               "description": "The list of metrics to compute.",
               "title": "Metrics"
            },
            "metric_groups": {
               "anyOf": [
                  {
                     "items": {
                        "$ref": "#/$defs/MetricGroup"
                     },
                     "type": "array"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": [],
               "description": "The list of metric groups to compute.",
               "title": "Metric Groups"
            }
         },
         "title": "MetricsConfiguration",
         "type": "object"
      },
      "ModelProvider": {
         "properties": {
            "type": {
               "$ref": "#/$defs/ModelProviderType",
               "description": "The type of model provider."
            }
         },
         "required": [
            "type"
         ],
         "title": "ModelProvider",
         "type": "object"
      },
      "ModelProviderType": {
         "description": "Supported model provider types for Generative AI",
         "enum": [
            "ibm_watsonx.ai",
            "azure_openai",
            "rits",
            "openai",
            "vertex_ai",
            "google_ai_studio",
            "aws_bedrock",
            "custom",
            "portkey"
         ],
         "title": "ModelProviderType",
         "type": "string"
      },
      "Node": {
         "description": "The class representing a node in an agentic application.\n\nExamples:\n    1. Create Node with metrics configuration and default agentic ai configuration\n        .. code-block:: python\n\n            metrics_configurations = [MetricsConfiguration(metrics=[ContextRelevanceMetric()],\n                                                           metric_groups=[MetricGroup.RETRIEVAL_QUALITY])])]\n            node = Node(name=\"Retrieval Node\",\n                        metrics_configurations=metrics_configurations)\n\n    2. Create Node with metrics configuration and specifying agentic ai configuration\n        .. code-block:: python\n\n            node_config = {\"input_fields\": [\"input\"],\n                           \"output_fields\": [\"output\"],\n                           \"context_fields\": [\"contexts\"],\n                           \"reference_fields\": [\"reference\"]}\n            metrics_configurations = [MetricsConfiguration(configuration=AgenticAIConfiguration(**node_config)\n                                                           metrics=[ContextRelevanceMetric()],\n                                                           metric_groups=[MetricGroup.RETRIEVAL_QUALITY])])]\n            node = Node(name=\"Retrieval Node\",\n                        metrics_configurations=metrics_configurations)",
         "properties": {
            "name": {
               "description": "The name of the node.",
               "title": "Name",
               "type": "string"
            },
            "func_name": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The name of the node function.",
               "title": "Node function name"
            },
            "metrics_configurations": {
               "default": [],
               "description": "The list of metrics and their configuration details.",
               "items": {
                  "$ref": "#/$defs/MetricsConfiguration"
               },
               "title": "Metrics configuration",
               "type": "array"
            },
            "foundation_models": {
               "default": [],
               "description": "The Foundation models invoked by the node",
               "items": {
                  "$ref": "#/$defs/FoundationModelInfo"
               },
               "title": "Foundation Models",
               "type": "array"
            }
         },
         "required": [
            "name"
         ],
         "title": "Node",
         "type": "object"
      },
      "OTLPCollectorConfiguration": {
         "description": "Defines the OTLPCollectorConfiguration class.\nIt contains the configuration settings for the OpenTelemetry Protocol collector.\n\nExamples:\n    1. Create OTLPCollectorConfiguration with default parameters\n        .. code-block:: python\n\n            oltp_config = OTLPCollectorConfiguration()\n\n    1. Create OTLPCollectorConfiguration by providing server endpoint details.\n        .. code-block:: python\n\n            oltp_config = OTLPCollectorConfiguration(app_name=\"app\",\n                                                     endpoint=\"https://hostname/ml/v1/traces\",\n                                                     timeout=10,\n                                                     headers={\"Authorization\": \"Bearer token\"})",
         "properties": {
            "app_name": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "description": "Application name for tracing.",
               "title": "App Name"
            },
            "endpoint": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": "http://localhost:4318/v1/traces",
               "description": "The OTLP collector endpoint URL for sending trace data. Default value is 'http://localhost:4318/v1/traces'",
               "title": "OTLP Endpoint"
            },
            "insecure": {
               "anyOf": [
                  {
                     "type": "boolean"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": false,
               "description": "Whether to disable TLS for the exporter (i.e., use an insecure connection). Default is False.",
               "title": "Insecure Connection"
            },
            "is_grpc": {
               "anyOf": [
                  {
                     "type": "boolean"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": false,
               "description": "If True, use gRPC for exporting traces instead of HTTP. Default is False.",
               "title": "Use gRPC"
            },
            "timeout": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": 100,
               "description": "Timeout in milliseconds for sending telemetry data to the collector. Default is 100ms.",
               "title": "Timeout"
            },
            "headers": {
               "anyOf": [
                  {
                     "additionalProperties": {
                        "type": "string"
                     },
                     "type": "object"
                  },
                  {
                     "type": "null"
                  }
               ],
               "description": "Headers needed to call the server.",
               "title": "Headers"
            }
         },
         "required": [
            "app_name"
         ],
         "title": "OTLPCollectorConfiguration",
         "type": "object"
      },
      "OpenAICredentials": {
         "description": "Defines the OpenAICredentials class to specify the OpenAI server details.\n\nExamples:\n    1. Create OpenAICredentials with default parameters. By default Dallas region is used.\n        .. code-block:: python\n\n            openai_credentials = OpenAICredentials(api_key=api_key,\n                                                   url=openai_url)\n\n    2. Create OpenAICredentials by reading from environment variables.\n        .. code-block:: python\n\n            os.environ[\"OPENAI_API_KEY\"] = \"...\"\n            os.environ[\"OPENAI_URL\"] = \"...\"\n            openai_credentials = OpenAICredentials.create_from_env()",
         "properties": {
            "url": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "title": "Url"
            },
            "api_key": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "title": "Api Key"
            }
         },
         "required": [
            "url",
            "api_key"
         ],
         "title": "OpenAICredentials",
         "type": "object"
      },
      "OpenAIFoundationModel": {
         "description": "The OpenAI foundation model details\n\nExamples:\n    1. Create OpenAI foundation model by passing the credentials during object creation. Note that the url is optional and will be set to the default value for OpenAI. To change the default value, the url should be passed to ``OpenAICredentials`` object.\n        .. code-block:: python\n\n            openai_foundation_model = OpenAIFoundationModel(\n                model_id=\"gpt-4o-mini\",\n                provider=OpenAIModelProvider(\n                    credentials=OpenAICredentials(\n                        api_key=api_key,\n                        url=openai_url,\n                    )\n                )\n            )\n\n    2. Create OpenAI foundation model by setting the credentials in environment variables:\n        * ``OPENAI_API_KEY`` is used to set the api key for OpenAI.\n        * ``OPENAI_URL`` is used to set the url for OpenAI\n\n        .. code-block:: python\n\n            openai_foundation_model = OpenAIFoundationModel(\n                model_id=\"gpt-4o-mini\",\n            )",
         "properties": {
            "model_name": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The name of the foundation model.",
               "title": "Model Name"
            },
            "provider": {
               "$ref": "#/$defs/OpenAIModelProvider",
               "description": "OpenAI provider"
            },
            "model_id": {
               "description": "Model name from OpenAI",
               "title": "Model Id",
               "type": "string"
            }
         },
         "required": [
            "model_id"
         ],
         "title": "OpenAIFoundationModel",
         "type": "object"
      },
      "OpenAIModelProvider": {
         "properties": {
            "type": {
               "$ref": "#/$defs/ModelProviderType",
               "default": "openai",
               "description": "The type of model provider."
            },
            "credentials": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/OpenAICredentials"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "OpenAI credentials. This can also be set by using `OPENAI_API_KEY` environment variable."
            }
         },
         "title": "OpenAIModelProvider",
         "type": "object"
      },
      "PortKeyCredentials": {
         "description": "Defines the PortKeyCredentials class to specify the PortKey Gateway details.\n\nExamples:\n    1. Create PortKeyCredentials with default parameters.\n        .. code-block:: python\n\n            portkey_credentials = PortKeyCredentials(api_key=api_key,\n                                                    url=portkey_url,\n                                                    provider_api_key=provider_api_key,\n                                                    provider=provider_name)\n\n    2. Create PortKeyCredentials by reading from environment variables.\n        .. code-block:: python\n\n            os.environ[\"PORTKEY_API_KEY\"] = \"...\"\n            os.environ[\"PORTKEY_URL\"] = \"...\"\n            os.environ[\"PORTKEY_PROVIDER_API_KEY\"] = \"...\"\n            os.environ[\"PORTKEY_PROVIDER_NAME\"] = \"...\"\n            portkey_credentials = PortKeyCredentials.create_from_env()",
         "properties": {
            "url": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "description": "PortKey url. This attribute can be read from `PORTKEY_URL` environment variable.",
               "title": "Url"
            },
            "api_key": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "description": "API key for PortKey. This attribute can be read from `PORTKEY_API_KEY` environment variable.",
               "title": "Api Key"
            },
            "provider_api_key": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "description": "API key for the provider. This attribute can be read from `PORTKEY_PROVIDER_API_KEY` environment variable.",
               "title": "Provider Api Key"
            },
            "provider": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "description": "The provider name. This attribute can be read from `PORTKEY_PROVIDER_NAME` environment variable.",
               "title": "Provider"
            }
         },
         "required": [
            "url",
            "api_key",
            "provider_api_key",
            "provider"
         ],
         "title": "PortKeyCredentials",
         "type": "object"
      },
      "PortKeyGateway": {
         "description": "The PortKey gateway details\n\nExamples:\n    1. Create PortKeyGateway by passing the credentials during object creation. Note that the url is optional and will be set to the default value for PortKey. To change the default value, the url should be passed to ``PortKeyCredentials`` object.\n        .. code-block:: python\n\n            port_key_gateway = PortKeyGateway(\n                model_id=\"gpt-4o-mini\",\n                provider=PortKeyModelProvider(\n                    credentials=PortKeyCredentials(\n                        api_key=api_key,\n                        url=openai_url,\n                        provider_api_key=provider_api_key,\n                        provider_name=provider_name\n                    )\n                )\n            )\n\n    2. Create PortKeyGateway by setting the credentials in environment variables:\n        * ``PORTKEY_API_KEY`` is used to set the api key for PortKey.\n        * ``PORTKEY_URL`` is used to set the url for PortKey.\n        * ``PORTKEY_PROVIDER_API_KEY`` is used to set the provider api key for PortKey.\n        * ``PORTKEY_PROVIDER_NAME`` is used to set the provider name for PortKey\n\n        .. code-block:: python\n\n            port_key_gateway = PortKeyGateway(\n                model_id=\"gpt-4o-mini\",\n            )",
         "properties": {
            "model_name": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The name of the foundation model.",
               "title": "Model Name"
            },
            "provider": {
               "$ref": "#/$defs/PortKeyModelProvider",
               "description": "PortKey Provider"
            },
            "model_id": {
               "description": "Model name from the Provider",
               "title": "Model Id",
               "type": "string"
            }
         },
         "required": [
            "model_id"
         ],
         "title": "PortKeyGateway",
         "type": "object"
      },
      "PortKeyModelProvider": {
         "properties": {
            "type": {
               "$ref": "#/$defs/ModelProviderType",
               "default": "portkey",
               "description": "The type of model provider."
            },
            "credentials": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/PortKeyCredentials"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "PortKey credentials."
            }
         },
         "title": "PortKeyModelProvider",
         "type": "object"
      },
      "RITSCredentials": {
         "properties": {
            "hostname": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": "https://inference-3scale-apicast-production.apps.rits.fmaas.res.ibm.com",
               "description": "The rits hostname",
               "title": "Hostname"
            },
            "api_key": {
               "title": "Api Key",
               "type": "string"
            }
         },
         "required": [
            "api_key"
         ],
         "title": "RITSCredentials",
         "type": "object"
      },
      "RITSFoundationModel": {
         "properties": {
            "model_name": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The name of the foundation model.",
               "title": "Model Name"
            },
            "provider": {
               "$ref": "#/$defs/RITSModelProvider",
               "description": "The provider of the model."
            }
         },
         "title": "RITSFoundationModel",
         "type": "object"
      },
      "RITSModelProvider": {
         "properties": {
            "type": {
               "$ref": "#/$defs/ModelProviderType",
               "default": "rits",
               "description": "The type of model provider."
            },
            "credentials": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/RITSCredentials"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "RITS credentials."
            }
         },
         "title": "RITSModelProvider",
         "type": "object"
      },
      "TaskType": {
         "description": "Supported task types for generative AI models",
         "enum": [
            "question_answering",
            "classification",
            "summarization",
            "generation",
            "extraction",
            "retrieval_augmented_generation"
         ],
         "title": "TaskType",
         "type": "string"
      },
      "TracingConfiguration": {
         "description": "Defines the tracing configuration class. \nTracing configuration is required if the the evaluations are needed to be tracked in an experiment or if the agentic application traces should be sent to a Open Telemetry Collector.\nOne of project_id or space_id is required.\nIf the otlp_collector_config is provided, the traces are logged to Open Telemetry Collector, otherwise the traces are logged to file on disk.\nIf its required to log the traces to both collector and local file, provide the otlp_collector_config and set the flag log_traces_to_file to True.\n\nExamples:\n    1. Create Tracing configuration to track the results in an experiment\n        .. code-block:: python\n\n            tracing_config = TracingConfiguration(project_id=\"...\")\n            agentic_evaluator = AgenticEvaluator(tracing_configuration=tracing_config)\n            agentic_evaluator.track_experiment(name=\"my_experiment\")\n            ...\n\n    2. Create Tracing configuration to send traces to collector\n        .. code-block:: python\n\n            oltp_collector_config = OTLPCollectorConfiguration(endpoint=\"http://hostname:4318/v1/traces\")\n            tracing_config = TracingConfiguration(space_id=\"...\",\n                                                  resource_attributes={\n                                                        \"wx-deployment-id\": deployment_id,\n                                                        \"wx-instance-id\": \"wml-instance-id1\",\n                                                        \"wx-ai-service-id\": \"ai-service-id1\"},\n                                                   otlp_collector_config=oltp_collector_config)\n            agentic_evaluator = AgenticEvaluator(tracing_configuration=tracing_config)\n            ...",
         "properties": {
            "project_id": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The project id.",
               "title": "Project ID"
            },
            "space_id": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The space id.",
               "title": "Space ID"
            },
            "resource_attributes": {
               "anyOf": [
                  {
                     "additionalProperties": {
                        "type": "string"
                     },
                     "type": "object"
                  },
                  {
                     "type": "null"
                  }
               ],
               "description": "The resource attributes set in all the spans.",
               "title": "Resource Attributes"
            },
            "otlp_collector_config": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/OTLPCollectorConfiguration"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "OTLP Collector configuration.",
               "title": "OTLP Collector Config"
            },
            "log_traces_to_file": {
               "anyOf": [
                  {
                     "type": "boolean"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": false,
               "description": "The flag to enable logging of traces to a file. If set to True, the traces are logged to a file. Use the flag when its needed to log the traces to file and to be sent to the server simultaneously.",
               "title": "Log Traces to file"
            }
         },
         "title": "TracingConfiguration",
         "type": "object"
      },
      "VertexAICredentials": {
         "description": "Defines the VertexAICredentials class for accessing Vertex AI using service account credentials.\n\nExamples:\n    1. Create credentials manually:\n        .. code-block:: python\n\n            vertex_credentials = VertexAICredentials(\n                credentials_path=\"path/to/service_account.json\",\n                project_id=\"my-gcp-project\",\n                location=\"us-central1\"\n            )\n\n    2. Create credentials from environment:\n        .. code-block:: python\n\n            os.environ[\"GOOGLE_APPLICATION_CREDENTIALS\"] = \"path/to/service_account.json\"\n            os.environ[\"GOOGLE_CLOUD_PROJECT\"] = \"my-gcp-project\"\n            os.environ[\"GOOGLE_CLOUD_LOCATION\"] = \"us-central1\"\n\n            vertex_ai_credentials = VertexAICredentials.create_from_env()",
         "properties": {
            "credentials_path": {
               "description": "Path to service-account JSON. This attribute can be read from GOOGLE_APPLICATION_CREDENTIALS environment variable when creating VertexAICredentials from environment.",
               "title": "Credentials Path",
               "type": "string"
            },
            "project_id": {
               "description": "The Google Cloud project id. This attribute can be read from GOOGLE_CLOUD_PROJECT or GCLOUD_PROJECT environment variable when creating VertexAICredentials from environment.",
               "title": "Project ID",
               "type": "string"
            },
            "location": {
               "default": "us-central1",
               "description": "Vertex AI region. This attribute can be read from GOOGLE_CLOUD_LOCATION environment variable when creating VertexAICredentials from environment. By default us-central1 location is used.",
               "examples": [
                  "us-central1",
                  "europe-west4"
               ],
               "title": "Location",
               "type": "string"
            }
         },
         "required": [
            "credentials_path",
            "project_id"
         ],
         "title": "VertexAICredentials",
         "type": "object"
      },
      "VertexAIFoundationModel": {
         "description": "Represents a foundation model served via Vertex AI.\n\nExamples:\n    1. Create Vertex AI foundation model by passing the credentials during object creation.\n        .. code-block:: python\n\n            model = VertexAIFoundationModel(\n                model_id=\"gemini-1.5-pro-002\",\n                provider=VertexAIModelProvider(\n                    credentials=VertexAICredentials(\n                        project_id=\"your-project\",\n                        location=\"us-central1\", # This is optional field, by default us-central1 location is selected\n                        credentials_path=\"/path/to/service_account.json\"\n                    )\n                )\n            )\n    2. Create Vertex AI foundation model by setting the credentials in environment variables:\n        * ``GOOGLE_APPLICATION_CREDENTIALS`` is used to set the Credentials path for Vertex AI.\n        * ``GOOGLE_CLOUD_PROJECT`` is used to set the Project id for Vertex AI.\n        * ``GOOGLE_CLOUD_LOCATION`` is uses to set the Location for Vertex AI. By default us-central1 location is used when GOOGLE_CLOUD_LOCATION is not provided .\n\n            .. code-block:: python\n\n                os.environ[\"GOOGLE_APPLICATION_CREDENTIALS\"] = \"path/to/service_account.json\"\n                os.environ[\"GOOGLE_CLOUD_PROJECT\"] = \"my-gcp-project\"\n                os.environ[\"GOOGLE_CLOUD_LOCATION\"] = \"us-central1\"\n\n                model = VertexAIFoundationModel(\n                    model_id=\"gemini/gpt-4o-mini\",\n                )",
         "properties": {
            "model_name": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The name of the foundation model.",
               "title": "Model Name"
            },
            "provider": {
               "$ref": "#/$defs/VertexAIModelProvider",
               "description": "Vertex AI provider.",
               "title": "Provider"
            },
            "model_id": {
               "description": "Model name for Vertex AI. Must be a valid Vertex AI model identifier or a fully-qualified publisher path",
               "examples": [
                  "gemini-1.5-pro-002"
               ],
               "title": "Model id",
               "type": "string"
            }
         },
         "required": [
            "model_id"
         ],
         "title": "VertexAIFoundationModel",
         "type": "object"
      },
      "VertexAIModelProvider": {
         "description": "Represents a model provider using Vertex AI.\n\nExamples:\n    1. Create provider using credentials object:\n        .. code-block:: python\n\n            provider = VertexAIModelProvider(\n                credentials=VertexAICredentials(\n                    credentials_path=\"path/to/key.json\",\n                    project_id=\"your-project\",\n                    location=\"us-central1\" \n                )\n            )\n\n    2. Create provider using environment variables:\n        .. code-block:: python\n\n            os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = \"/path/to/service_account.json\"\n            os.environ['GOOGLE_CLOUD_PROJECT'] = \"your-project\"\n            os.environ['GOOGLE_CLOUD_LOCATION'] = \"us-central1\" # This is optional field, by default us-central1 location is selected\n\n            provider = VertexAIModelProvider()",
         "properties": {
            "type": {
               "$ref": "#/$defs/ModelProviderType",
               "default": "vertex_ai",
               "description": "The type of model provider."
            },
            "credentials": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/VertexAICredentials"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Vertex AI credentials."
            }
         },
         "title": "VertexAIModelProvider",
         "type": "object"
      },
      "WxAICredentials": {
         "description": "Defines the WxAICredentials class to specify the watsonx.ai server details.\n\nExamples:\n    1. Create WxAICredentials with default parameters. By default Dallas region is used.\n        .. code-block:: python\n\n            wxai_credentials = WxAICredentials(api_key=\"...\")\n\n    2. Create WxAICredentials by specifying region url.\n        .. code-block:: python\n\n            wxai_credentials = WxAICredentials(api_key=\"...\",\n                                               url=\"https://au-syd.ml.cloud.ibm.com\")\n\n    3. Create WxAICredentials by reading from environment variables.\n        .. code-block:: python\n\n            os.environ[\"WATSONX_APIKEY\"] = \"...\"\n            # [Optional] Specify watsonx region specific url. Default is https://us-south.ml.cloud.ibm.com .\n            os.environ[\"WATSONX_URL\"] = \"https://eu-gb.ml.cloud.ibm.com\"\n            wxai_credentials = WxAICredentials.create_from_env()\n\n    4. Create WxAICredentials for on-prem.\n        .. code-block:: python\n\n            wxai_credentials = WxAICredentials(url=\"https://<hostname>\",\n                                               username=\"...\"\n                                               api_key=\"...\",\n                                               version=\"5.2\")\n\n    5. Create WxAICredentials by reading from environment variables for on-prem.\n        .. code-block:: python\n\n            os.environ[\"WATSONX_URL\"] = \"https://<hostname>\"\n            os.environ[\"WATSONX_VERSION\"] = \"5.2\"\n            os.environ[\"WATSONX_USERNAME\"] = \"...\"\n            os.environ[\"WATSONX_APIKEY\"] = \"...\"\n            # Only one of api_key or password is needed\n            #os.environ[\"WATSONX_PASSWORD\"] = \"...\"\n            wxai_credentials = WxAICredentials.create_from_env()",
         "properties": {
            "url": {
               "default": "https://us-south.ml.cloud.ibm.com",
               "description": "The url for watsonx ai service",
               "examples": [
                  "https://us-south.ml.cloud.ibm.com",
                  "https://eu-de.ml.cloud.ibm.com",
                  "https://eu-gb.ml.cloud.ibm.com",
                  "https://jp-tok.ml.cloud.ibm.com",
                  "https://au-syd.ml.cloud.ibm.com"
               ],
               "title": "watsonx.ai url",
               "type": "string"
            },
            "api_key": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The user api key. Required for using watsonx as a service and one of api_key or password is required for using watsonx on-prem software.",
               "strip_whitespace": true,
               "title": "Api Key"
            },
            "version": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The watsonx on-prem software version. Required for using watsonx on-prem software.",
               "title": "Version"
            },
            "username": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The user name. Required for using watsonx on-prem software.",
               "title": "User name"
            },
            "password": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The user password. One of api_key or password is required for using watsonx on-prem software.",
               "title": "Password"
            },
            "instance_id": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": "openshift",
               "description": "The watsonx.ai instance id. Default value is openshift.",
               "title": "Instance id"
            }
         },
         "title": "WxAICredentials",
         "type": "object"
      },
      "WxAIFoundationModel": {
         "description": "The IBM watsonx.ai foundation model details\n\nTo initialize the foundation model, you can either pass in the credentials directly or set the environment.\nYou can follow these examples to create the provider.\n\nExamples:\n    1. Create foundation model by specifying the credentials during object creation:\n        .. code-block:: python\n\n            # Specify the credentials during object creation\n            wx_ai_foundation_model = WxAIFoundationModel(\n                model_id=\"ibm/granite-3-3-8b-instruct\",\n                project_id=<PROJECT_ID>,\n                provider=WxAIModelProvider(\n                    credentials=WxAICredentials(\n                        url=wx_url, # This is optional field, by default US-Dallas region is selected\n                        api_key=wx_apikey,\n                    )\n                )\n            )\n\n    2. Create foundation model by setting the credentials environment variables:\n        * The api key can be set using one of the environment variables ``WXAI_API_KEY``, ``WATSONX_APIKEY``, or ``WXG_API_KEY``. These will be read in the order of precedence.\n        * The url is optional and will be set to US-Dallas region by default. It can be set using one of the environment variables ``WXAI_URL``, ``WATSONX_URL``, or ``WXG_URL``. These will be read in the order of precedence.\n\n        .. code-block:: python\n\n            wx_ai_foundation_model = WxAIFoundationModel(\n                model_id=\"ibm/granite-3-3-8b-instruct\",\n                project_id=<PROJECT_ID>,\n            )\n\n    3. Create foundation model by specifying watsonx.governance software credentials during object creation:\n        .. code-block:: python\n\n            wx_ai_foundation_model = WxAIFoundationModel(\n                model_id=\"ibm/granite-3-3-8b-instruct\",\n                project_id=project_id,\n                provider=WxAIModelProvider(\n                    credentials=WxAICredentials(\n                        url=wx_url,\n                        api_key=wx_apikey,\n                        username=wx_username,\n                        version=wx_version,\n                    )\n                )\n            )\n\n    4. Create foundation model by setting watsonx.governance software credentials environment variables:\n        * The api key can be set using one of the environment variables ``WXAI_API_KEY``, ``WATSONX_APIKEY``, or ``WXG_API_KEY``. These will be read in the order of precedence.\n        * The url can be set using one of these environment variable ``WXAI_URL``, ``WATSONX_URL``, or ``WXG_URL``. These will be read in the order of precedence.\n        * The username can be set using one of these environment variable ``WXAI_USERNAME``, ``WATSONX_USERNAME``, or ``WXG_USERNAME``. These will be read in the order of precedence.\n        * The version of watsonx.governance software can be set using one of these environment variable ``WXAI_VERSION``, ``WATSONX_VERSION``, or ``WXG_VERSION``. These will be read in the order of precedence.\n\n        .. code-block:: python\n\n            wx_ai_foundation_model = WxAIFoundationModel(\n                model_id=\"ibm/granite-3-3-8b-instruct\",\n                project_id=project_id,\n            )",
         "properties": {
            "model_name": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The name of the foundation model.",
               "title": "Model Name"
            },
            "provider": {
               "$ref": "#/$defs/WxAIModelProvider",
               "description": "The provider of the model."
            },
            "model_id": {
               "description": "The unique identifier for the watsonx.ai model.",
               "title": "Model Id",
               "type": "string"
            },
            "project_id": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The project ID associated with the model.",
               "title": "Project Id"
            },
            "space_id": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The space ID associated with the model.",
               "title": "Space Id"
            }
         },
         "required": [
            "model_id"
         ],
         "title": "WxAIFoundationModel",
         "type": "object"
      },
      "WxAIModelProvider": {
         "description": "This class represents a model provider configuration for IBM watsonx.ai. It includes the provider type and\ncredentials required to authenticate and interact with the watsonx.ai platform. If credentials are not explicitly\nprovided, it attempts to load them from environment variables.\n\nExamples:\n    1. Create provider using credentials object:\n        .. code-block:: python\n\n            credentials = WxAICredentials(\n                url=\"https://us-south.ml.cloud.ibm.com\",\n                api_key=\"your-api-key\"\n            )\n            provider = WxAIModelProvider(credentials=credentials)\n\n    2. Create provider using environment variables:\n        .. code-block:: python\n\n            import os\n\n            os.environ['WATSONX_URL'] = \"https://us-south.ml.cloud.ibm.com\"\n            os.environ['WATSONX_APIKEY'] = \"your_api_key\"\n\n            provider = WxAIModelProvider()",
         "properties": {
            "type": {
               "$ref": "#/$defs/ModelProviderType",
               "default": "ibm_watsonx.ai",
               "description": "The type of model provider."
            },
            "credentials": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/WxAICredentials"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The credentials used to authenticate with watsonx.ai. If not provided, they will be loaded from environment variables."
            }
         },
         "title": "WxAIModelProvider",
         "type": "object"
      }
   }
}

Config:

arbitrary_types_allowed: bool = True

Fields:

agentic_app (ibm_watsonx_gov.entities.agentic_app.AgenticApp | None)
ai_experiment_client (ibm_watsonx_gov.ai_experiments.ai_experiments_client.AIExperimentsClient | None)
max_concurrency (int)
tracing_configuration (ibm_watsonx_gov.config.agentic_ai_configuration.TracingConfiguration | None)

field agentic_app: Annotated[AgenticApp | None, FieldInfo(annotation=NoneType, required=False, default=None, title='Agentic application configuration details', description='The agentic application configuration details.')] = None¶: The agentic application configuration details.

field ai_experiment_client: Annotated[AIExperimentsClient | None, FieldInfo(annotation=NoneType, required=False, default=None, title='AI experiments client', description='The AI experiment client object.')] = None¶: The AI experiment client object.

field max_concurrency: Annotated[int, FieldInfo(annotation=NoneType, required=False, default=10, title='Max Concurrency', description='The maximum concurrency to use for evaluating metrics.')] = 10¶: The maximum concurrency to use for evaluating metrics.

field tracing_configuration: Annotated[TracingConfiguration | None, FieldInfo(annotation=NoneType, required=False, default=None, title='Tracing Configuration', description='The tracing configuration details.')] = None¶: The tracing configuration details.

compare_ai_experiments(ai_experiments: List[AIExperiment] = None, ai_evaluation_details: AIEvaluationAsset = None) → str¶

Creates an AI Evaluation asset to compare AI experiment runs.

Parameters:

ai_experiments (List[AIExperiment], optional) – List of AI experiments to be compared. If all runs for an experiment need to be compared, then specify the runs value as empty list for the experiment.
ai_evaluation_details (AIEvaluationAsset, optional) – An instance of AIEvaluationAsset having details (name, description and metrics configuration)

Returns:

An instance of AIEvaluationAsset.

Examples

Create AI evaluation with list of experiment IDs

# Initialize the API client with credentials
api_client = APIClient(credentials=Credentials(api_key="", url="wos_url"))

# Create the instance of Agentic evaluator
evaluator = AgenticEvaluator(api_client=api_client, tracing_configuration=TracingConfiguration(project_id=project_id))

# [Optional] Define evaluation configuration
evaluation_config = EvaluationConfig(
    monitors={
        "agentic_ai_quality": {
            "parameters": {
                "metrics_configuration": {}
            }
        }
    }
)

# Create the evaluation asset
ai_evaluation_details = AIEvaluationAsset(
    name="AI Evaluation for agent",
    evaluation_configuration=evaluation_config
)

# Compare two or more AI experiments using the evaluation asset
ai_experiment1 = AIExperiment(
    asset_id = ai_experiment_id_1,
    runs = [<Run1 details>, <Run2 details>] # Run details are returned by the start_run method
)
ai_experiment2 = AIExperiment(
    asset_id = ai_experiment_id_2,
    runs = [] # Empty list means all runs for this experiment will be compared
)
ai_evaluation_asset_href = evaluator.compare_ai_experiments(
    ai_experiments = [ai_experiment_1, ai_experiment_2],
    ai_evaluation_details=ai_evaluation_asset
)

end_run(track_notebook: bool | None = False)¶

End a run to collect and compute the metrics within the current run.

Parameters:: track_notebook (bool) – flag to specify storing the notebook with current run

evaluate_answer_quality(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) → dict¶

An evaluation decorator for computing answer quality metrics on an agentic tool. Answer Quality metrics include Answer Relevance, Faithfulness, Answer Similarity, Unsuccessful Requests

For more details, see ibm_watsonx_gov.metrics.AnswerRelevanceMetric, ibm_watsonx_gov.metrics.FaithfulnessMetric, ibm_watsonx_gov.metrics.UnsuccessfulRequestsMetric, see ibm_watsonx_gov.metrics.AnswerSimilarityMetric,

Parameters:

func (Optional[Callable], optional) – The tool on which the metric is to be computed.
configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.
metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to MetricGroup.ANSWER_QUALITY.get_metrics().

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

Basic usage

evaluator = AgenticEvaluator()
@evaluator.evaluate_answer_quality
def agentic_tool(*args, *kwargs):
    pass

Usage with different thresholds and methods for some of the metrics in the group

metric_1 = FaithfulnessMetric(thresholds=MetricThreshold(type="lower_limit", value=0.5))
metric_2 = AnswerRelevanceMetric(method="token_recall", thresholds=MetricThreshold(type="lower_limit", value=0.5))

evaluator = AgenticEvaluator()
@evaluator.evaluate_answer_quality(metrics=[metric_1, metric_2])
def agentic_tool(*args, *kwargs):
    pass

evaluate_answer_relevance(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) → dict¶

An evaluation decorator for computing answer relevance metric on an agentic tool.

For more details, see ibm_watsonx_gov.metrics.AnswerRelevanceMetric

Parameters:

func (Optional[Callable], optional) – The tool on which the metric is to be computed.
configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.
metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to [ AnswerRelevanceMetric() ].

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

Basic usage

evaluator = AgenticEvaluator()
@evaluator.evaluate_answer_relevance
def agentic_tool(*args, *kwargs):
    pass

Usage with different thresholds and methods

metric_1 = AnswerRelevanceMetric(method="token_recall", thresholds=[MetricThreshold(type="lower_limit", value=0.5)])
metric_2 = AnswerRelevanceMetric(method="granite_guardian", thresholds=[MetricThreshold(type="lower_limit", value=0.5)])

evaluator = AgenticEvaluator()
@evaluator.evaluate_answer_relevance(metrics=[metric_1, metric_2])
def agentic_tool(*args, *kwargs):
    pass

evaluate_answer_similarity(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) → dict¶

An evaluation decorator for computing answer similarity metric on an agentic node.

For more details, see ibm_watsonx_gov.metrics.AnswerSimilarityMetric

Parameters:

func (Optional[Callable], optional) – The node on which the metric is to be computed.
configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.
metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to [ AnswerSimilarityMetric() ].
compute_real_time (Optional[bool], optional) – The flag to indicate whether the metric should be computed along with the node execution or not.

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped node.

Return type:

dict

Examples

Basic usage

evaluator = AgenticEvaluator()
@evaluator.evaluate_answer_similarity
def agentic_node(*args, *kwargs):
    pass

Usage with different thresholds and methods

metric_1 = AnswerSimilarityMetric(
    method="token_k_precision", threshold=MetricThreshold(type="lower_limit", value=0.5))
metric_2 = AnswerSimilarityMetric(
    method="sentence_bert_mini_lm", threshold=MetricThreshold(type="lower_limit", value=0.6))

evaluator = AgenticEvaluator()
@evaluator.evaluate_answer_similarity(metrics=[metric_1, metric_2])
def agentic_node(*args, *kwargs):
    pass

evaluate_average_precision(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) → dict¶

An evaluation decorator for computing average precision metric on an agentic tool. This metric uses context relevance values for computation, context relevance metric would be computed as a prerequisite.

For more details, see ibm_watsonx_gov.metrics.AveragePrecisionMetric

Parameters:

func (Optional[Callable], optional) – The tool on which the metric is to be computed.
configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.
metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to [ AveragePrecisionMetric() ].

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

Basic usage

evaluator = AgenticEvaluator()
@evaluator.evaluate_average_precision
def agentic_tool(*args, *kwargs):
    pass

Usage with different thresholds and methods

metric_1 = AveragePrecisionMetric(threshold=MetricThreshold(type="lower_limit", value=0.5))
metric_2 = ContextRelevanceMetric(method="sentence_bert_mini_lm", threshold=MetricThreshold(type="lower_limit", value=0.6))

evaluator = AgenticEvaluator()
@evaluator.evaluate_average_precision(metrics=[metric_1, metric_2])
def agentic_tool(*args, *kwargs):
    pass

evaluate_content_safety(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) → dict¶

An evaluation decorator for computing content safety metrics on an agentic tool. Content Safety metrics include HAP, PII, Evasiveness, Harm, HarmEngagement, Jailbreak, Profanity, SexualContent, Social Bias, UnethicalBehavior and Violence

For more details, see ibm_watsonx_gov.metrics.HAPMetric, ibm_watsonx_gov.metrics.PIIMetric, ibm_watsonx_gov.metrics.EvasivenessMetric, ibm_watsonx_gov.metrics.HarmMetric, ibm_watsonx_gov.metrics.HarmEngagementMetric, ibm_watsonx_gov.metrics.JailbreakMetric, ibm_watsonx_gov.metrics.ProfanityMetric, ibm_watsonx_gov.metrics.SexualContentMetric, ibm_watsonx_gov.metrics.SocialBiasMetric, ibm_watsonx_gov.metrics.UnethicalBehaviorMetric, ibm_watsonx_gov.metrics.ViolenceMetric :param func: The tool on which the metric is to be computed. :type func: Optional[Callable], optional :param configuration: The configuration specific to this evaluator. Defaults to None. :type configuration: Optional[AgenticAIConfiguration], optional :param metrics: The list of metrics to compute as part of this evaluator. Defaults to MetricGroup.CONTENT_SAFETY.get_metrics(). :type metrics: list[GenAIMetric], optional

Raises:: Exception – If there is any error while evaluation.
Returns:: The result of the wrapped tool.
Return type:: dict

Example

Basic usage

evaluator = AgenticEvaluator()
@evaluator.evaluate_content_safety
def agentic_tool(*args, *kwargs):
    pass

Usage with different thresholds and methods for some of the metrics in the group

metric_1 = PIIMetric(thresholds=MetricThreshold(type="lower_limit", value=0.5))
metric_2 = HAPMetric(thresholds=MetricThreshold(type="lower_limit", value=0.5))

evaluator = AgenticEvaluator()
@evaluator.evaluate_content_safety(metrics=[metric_1, metric_2])
def agentic_tool(*args, *kwargs):
    pass

evaluate_context_relevance(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) → dict¶

An evaluation decorator for computing context relevance metric on an agentic node.

For more details, see ibm_watsonx_gov.metrics.ContextRelevanceMetric

Parameters:

func (Optional[Callable], optional) – The node on which the metric is to be computed.
configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.
metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to [ ContextRelevanceMetric() ].
compute_real_time (Optional[bool], optional) – The flag to indicate whether the metric should be computed along with the node execution or not.

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped node.

Return type:

dict

Examples

Basic usage

evaluator = AgenticEvaluator()
@evaluator.evaluate_context_relevance
def agentic_node(*args, *kwargs):
    pass

Usage with different thresholds and methods

metric_1 = ContextRelevanceMetric(
    method="sentence_bert_bge", thresholds=MetricThreshold(type="lower_limit", value=0.5))
metric_2 = ContextRelevanceMetric(
    method="sentence_bert_mini_lm", thresholds=MetricThreshold(type="lower_limit", value=0.6))
metric_3 = ContextRelevanceMetric(
    method="granite_guardian", thresholds=MetricThreshold(type="lower_limit", value=0.6))
evaluator = AgenticEvaluator()
@evaluator.evaluate_context_relevance(metrics=[metric_1, metric_2, metric_3])
def agentic_node(*args, *kwargs):
    pass

evaluate_evasiveness(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) → dict¶

An evaluation decorator for computing evasiveness on an agentic tool via granite guardian.

For more details, see ibm_watsonx_gov.metrics.EvasivenessMetric

Parameters:

func (Optional[Callable], optional) – The tool on which the metric is to be computed.
configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.
metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to [ ProfanityMetric() ]

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

Create evaluate_evasiveness decorator with default parameters. By default, the metric uses the “input_text” from the graph state as the input.
evaluator = AgenticEvaluator() @evaluator.evaluate_evasiveness def agentic_tool(*args, *kwargs): pass

Create evaluate_evasiveness decorator with thresholds and configuration

metric = EvasivenessMetric(thresholds=MetricThreshold(type="lower_limit", value=0.7))
config = {"input_fields": ["input"]}
configuration = AgenticAIConfiguration(**config)
evaluator = AgenticEvaluator()
@evaluator.evaluate_evasiveness(metrics=[metric], configuration=configuration)
def agentic_tool(*args, *kwargs):
    pass

evaluate_faithfulness(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) → dict¶

An evaluation decorator for computing faithfulness metric on an agentic node.

For more details, see ibm_watsonx_gov.metrics.FaithfulnessMetric

Parameters:

func (Optional[Callable], optional) – The node on which the metric is to be computed.
configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.
metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to [ FaithfulnessMetric() ].
compute_real_time (Optional[bool], optional) – The flag to indicate whether the metric should be computed along with the node execution or not.

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped node.

Return type:

dict

Examples

Basic usage

evaluator = AgenticEvaluator()
@evaluator.evaluate_faithfulness
def agentic_node(*args, *kwargs):
    pass

Usage with different thresholds and methods

metric_1 = FaithfulnessMetric(method="token_k_precision", threshold=MetricThreshold(type="lower_limit", value=0.5))
metric_2 = FaithfulnessMetric(method="sentence_bert_mini_lm", threshold=MetricThreshold(type="lower_limit", value=0.6))

evaluator = AgenticEvaluator()
@evaluator.evaluate_faithfulness(metrics=[metric_1, metric_2])
def agentic_node(*args, *kwargs):
    pass

evaluate_general_quality_with_llm(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) → dict¶

An evaluation decorator for computing llm validation metric on an agentic node.

For more details, see ibm_watsonx_gov.metrics.LLMValidationMetric

Parameters:

func (Optional[Callable], optional) – The node on which the metric is to be computed.
configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.
metrics (list[GenAIMetric]) – The list of metrics to compute as part of this evaluator.
compute_real_time (Optional[bool], optional) – The flag to indicate whether the metric should be computed along with the node execution or not. When online is set to False, evaluate_metrics method should be invoked on the AgenticEvaluator to compute the metric.

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped node.

Return type:

dict

Examples

Basic usage

evaluator = AgenticEvaluator()
@evaluator.evaluate_general_quality_with_llm
def agentic_node(*args, *kwargs):
    pass

evaluate_hap(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) → dict¶

An evaluation decorator for computing HAP metric on an agentic tool.

For more details, see ibm_watsonx_gov.metrics.HAPMetric

Parameters:

func (Optional[Callable], optional) – The tool on which the metric is to be computed.
configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.
metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to [HAPMetric()].

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

Create evaluate_hap decorator with default parameters. By default, the metric uses the “input_text” from the graph state as the input.
evaluator = AgenticEvaluator() @evaluator.evaluate_hap def agentic_tool(*args, *kwargs): pass

Create evaluate_hap decorator with thresholds and configuration

metric = HAPMetric(thresholds=MetricThreshold(type="lower_limit", value=0.7))
config = {"input_fields": ["input"]}
configuration = AgenticAIConfiguration(**config)
evaluator = AgenticEvaluator()
@evaluator.evaluate_hap(metrics=[metric], configuration=configuration)
def agentic_tool(*args, *kwargs):
    pass

evaluate_harm(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) → dict¶

An evaluation decorator for computing harm risk on an agentic tool via granite guardian.

For more details, see ibm_watsonx_gov.metrics.HarmMetric

Parameters:

func (Optional[Callable], optional) – The tool on which the metric is to be computed.
configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.
metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to [ HarmMetric() ]

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

Create evaluate_harm decorator with default parameters. By default, the metric uses the “input_text” from the graph state as the input.
evaluator = AgenticEvaluator() @evaluator.evaluate_harm def agentic_tool(*args, *kwargs): pass

Create evaluate_harm decorator with thresholds and configuration

metric = HarmMetric(thresholds=MetricThreshold(type="lower_limit", value=0.7))
config = {"input_fields": ["input"]}
configuration = AgenticAIConfiguration(**config)
evaluator = AgenticEvaluator()
@evaluator.evaluate_harm(metrics=[metric], configuration=configuration)
def agentic_tool(*args, *kwargs):
    pass

evaluate_harm_engagement(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) → dict¶

An evaluation decorator for computing harm engagement on an agentic tool via granite guardian.

For more details, see ibm_watsonx_gov.metrics.HarmEngagementMetric

Parameters:

func (Optional[Callable], optional) – The tool on which the metric is to be computed.
configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.
metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to [ ProfanityMetric() ]

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

Create evaluate_harm_engagement decorator with default parameters. By default, the metric uses the “input_text” from the graph state as the input.
evaluator = AgenticEvaluator() @evaluator.evaluate_harm_engagement def agentic_tool(*args, *kwargs): pass

Create evaluate_harm_engagement decorator with thresholds and configuration

metric = HarmEngagementMetric(thresholds=MetricThreshold(type="lower_limit", value=0.7))
config = {"input_fields": ["input"]}
configuration = AgenticAIConfiguration(**config)
evaluator = AgenticEvaluator()
@evaluator.evaluate_harm_engagement(metrics=[metric], configuration=configuration)
def agentic_tool(*args, *kwargs):
    pass

evaluate_hit_rate(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) → dict¶

An evaluation decorator for computing hit rate metric on an agentic tool. This metric uses context relevance values for computation, context relevance metric would be computed as a prerequisite.

For more details, see ibm_watsonx_gov.metrics.HitRateMetric

Parameters:

func (Optional[Callable], optional) – The tool on which the metric is to be computed.
configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.
metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to [ HitRateMetric() ].

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

Basic usage

evaluator = AgenticEvaluator()
@evaluator.evaluate_hit_rate
def agentic_tool(*args, *kwargs):
    pass

Usage with different thresholds and methods

metric_1 = HitRateMetric(threshold=MetricThreshold(type="lower_limit", value=0.5))
metric_2 = ContextRelevanceMetric(method="sentence_bert_mini_lm", threshold=MetricThreshold(type="lower_limit", value=0.6))

evaluator = AgenticEvaluator()
@evaluator.evaluate_hit_rate(metrics=[metric_1, metric_2])
def agentic_tool(*args, *kwargs):
    pass

evaluate_jailbreak(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) → dict¶

An evaluation decorator for computing jailbreak on an agentic tool via granite guardian.

For more details, see ibm_watsonx_gov.metrics.JailbreakMetric

Parameters:

func (Optional[Callable], optional) – The tool on which the metric is to be computed.
configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.
metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to [ ProfanityMetric() ]

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

Create evaluate_jailbreak decorator with default parameters. By default, the metric uses the “input_text” from the graph state as the input.
evaluator = AgenticEvaluator() @evaluator.evaluate_jailbreak def agentic_tool(*args, *kwargs): pass

Create evaluate_jailbreak decorator with thresholds and configuration

metric = JailbreakMetric(thresholds=MetricThreshold(type="lower_limit", value=0.7))
config = {"input_fields": ["input"]}
configuration = AgenticAIConfiguration(**config)
evaluator = AgenticEvaluator()
@evaluator.evaluate_jailbreak(metrics=[metric], configuration=configuration)
def agentic_tool(*args, *kwargs):
    pass

evaluate_ndcg(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) → dict¶

An evaluation decorator for computing ndcg metric on an agentic tool. This metric uses context relevance values for computation, context relevance metric would be computed as a prerequisite.

For more details, see ibm_watsonx_gov.metrics.NDCGMetric

Parameters:

func (Optional[Callable], optional) – The tool on which the metric is to be computed.
configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.
metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to [ NDCGMetric() ].

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

Basic usage

evaluator = AgenticEvaluator()
@evaluator.evaluate_ndcg
def agentic_tool(*args, *kwargs):
    pass

Usage with different thresholds and methods

metric_1 = NDCGMetric(threshold=MetricThreshold(type="lower_limit", value=0.5))
metric_2 = ContextRelevanceMetric(method="sentence_bert_mini_lm", threshold=MetricThreshold(type="lower_limit", value=0.6))

evaluator = AgenticEvaluator()
@evaluator.evaluate_ndcg(metrics=[metric_1, metric_2])
def agentic_tool(*args, *kwargs):
    pass

evaluate_pii(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) → dict¶

An evaluation decorator for computing PII metric on an agentic tool.

For more details, see ibm_watsonx_gov.metrics.PIIMetric

Parameters:

func (Optional[Callable], optional) – The tool on which the metric is to be computed.
configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.
metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to [PIIMetric()].

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

Create evaluate_pii decorator with default parameters. By default, the metric uses the “input_text” from the graph state as the input.
evaluator = AgenticEvaluator() @evaluator.evaluate_pii def agentic_tool(*args, *kwargs): pass

Create evaluate_pii decorator with thresholds and configuration

metric = PIIMetric(thresholds=MetricThreshold(type="lower_limit", value=0.7))
config = {"input_fields": ["input"]}
configuration = AgenticAIConfiguration(**config)
evaluator = AgenticEvaluator()
@evaluator.evaluate_pii(metrics=[metric], configuration=configuration)
def agentic_tool(*args, *kwargs):
    pass

evaluate_profanity(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) → dict¶

An evaluation decorator for computing profanity on an agentic tool via granite guardian.

For more details, see ibm_watsonx_gov.metrics.ProfanityMetric

Parameters:

func (Optional[Callable], optional) – The tool on which the metric is to be computed.
configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.
metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to [ ProfanityMetric() ]

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

Create evaluate_profanity decorator with default parameters. By default, the metric uses the “input_text” from the graph state as the input.
evaluator = AgenticEvaluator() @evaluator.evaluate_profanity def agentic_tool(*args, *kwargs): pass

Create evaluate_profanity decorator with thresholds and configuration

metric = ProfanityMetric(thresholds=MetricThreshold(type="lower_limit", value=0.7))
config = {"input_fields": ["input"]}
configuration = AgenticAIConfiguration(**config)
evaluator = AgenticEvaluator()
@evaluator.evaluate_profanity(metrics=[metric], configuration=configuration)
def agentic_tool(*args, *kwargs):
    pass

evaluate_prompt_safety_risk(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric], compute_real_time: bool | None = True) → dict¶

An evaluation decorator for computing prompt safety risk metric on an agentic tool.

For more details, see ibm_watsonx_gov.metrics.PromptSafetyRiskMetric

Parameters:

func (Optional[Callable], optional) – The tool on which the metric is to be computed.
configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.
metrics (list[GenAIMetric]) – The list of metrics to compute as part of this evaluator.

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

Create evaluate_prompt_safety_risk decorator with default parameters. By default, the metric uses the “input_text” from the graph state as the input.

evaluator = AgenticEvaluator()
@evaluator.evaluate_prompt_safety_risk(metrics=[PromptSafetyRiskMetric(system_prompt="...")])
def agentic_tool(*args, *kwargs):
    pass

Create evaluate_prompt_safety_risk decorator with thresholds and configuration

metric = PromptSafetyRiskMetric(system_prompt="...", thresholds=MetricThreshold(type="lower_limit", value=0.7))
config = {"input_fields": ["input"]}
configuration = AgenticAIConfiguration(**config)
evaluator = AgenticEvaluator()
@evaluator.evaluate_prompt_safety_risk(metrics=[metric], configuration=configuration)
def agentic_tool(*args, *kwargs):
    pass

evaluate_readability(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) → dict¶

An evaluation decorator for computing answer readability metrics on an agentic tool. Readability metrics include TextReadingEaseMetric and TextGradeLevelMetric

For more details, see ibm_watsonx_gov.metrics.TextReadingEaseMetric, ibm_watsonx_gov.metrics.TextGradeLevelMetric

Parameters:

func (Optional[Callable], optional) – The tool on which the metric is to be computed.
configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.
metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to MetricGroup.READABILITY.get_metrics().

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

Basic usage

evaluator = AgenticEvaluator()
@evaluator.evaluate_readability
def agentic_tool(*args, *kwargs):
    pass

Usage with different thresholds and methods for some of the metrics in the group

metric_1 = TextGradeLevelMetric(thresholds=[MetricThreshold(type="lower_limit", value=6)])
metric_2 = TextReadingEaseMetric(thresholds=[MetricThreshold(type="lower_limit", value=70)])
config = {"output_fields": ["generated_text"]}
configuration = AgenticAIConfiguration(**config)
evaluator = AgenticEvaluator()
@evaluator.evaluate_readability(metrics=[metric_1, metric_2], configuration=configuration)
def agentic_tool(*args, *kwargs):
    pass

evaluate_reciprocal_rank(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) → dict¶

An evaluation decorator for computing reciprocal precision metric on an agentic tool. This metric uses context relevance values for computation, context relevance metric would be computed as a prerequisite.

For more details, see ibm_watsonx_gov.metrics.ReciprocalRankMetric

Parameters:

func (Optional[Callable], optional) – The tool on which the metric is to be computed.
configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.
metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to [ ReciprocalRankMetric() ].

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

Basic usage

evaluator = AgenticEvaluator()
@evaluator.evaluate_reciprocal_rank
def agentic_tool(*args, *kwargs):
    pass

Usage with different thresholds and methods

metric_1 = ReciprocalRankMetric(threshold=MetricThreshold(type="lower_limit", value=0.5))
metric_2 = ContextRelevanceMetric(method="sentence_bert_mini_lm", threshold=MetricThreshold(type="lower_limit", value=0.6))

evaluator = AgenticEvaluator()
@evaluator.evaluate_reciprocal_rank(metrics=[metric_1, metric_2])
def agentic_tool(*args, *kwargs):
    pass

evaluate_retrieval_precision(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) → dict¶

An evaluation decorator for computing retrieval precision metric on an agentic tool. This metric uses context relevance values for computation, context relevance metric would be computed as a prerequisite.

For more details, see ibm_watsonx_gov.metrics.RetrievalPrecisionMetric

Parameters:

func (Optional[Callable], optional) – The tool on which the metric is to be computed.
configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.
metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to [ RetrievalPrecisionMetric() ].

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

Basic usage

evaluator = AgenticEvaluator()
@evaluator.evaluate_retrieval_precision
def agentic_tool(*args, *kwargs):
    pass

Usage with different thresholds and methods

metric_1 = AveragePrecisionMetric(threshold=MetricThreshold(type="lower_limit", value=0.5))
metric_2 = ContextRelevanceMetric(method="sentence_bert_mini_lm", threshold=MetricThreshold(type="lower_limit", value=0.6))

evaluator = AgenticEvaluator()
@evaluator.evaluate_retrieval_precision(metrics=[metric_1, metric_2])
def agentic_tool(*args, *kwargs):
    pass

evaluate_retrieval_quality(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) → dict¶

An evaluation decorator for computing retrieval quality metrics on an agentic tool. Retrieval Quality metrics include Context Relevance, Retrieval Precision, Average Precision, Hit Rate, Reciprocal Rank, NDCG

For more details, see ibm_watsonx_gov.metrics.ContextRelevanceMetric, ibm_watsonx_gov.metrics.RetrievalPrecisionMetric, ibm_watsonx_gov.metrics.AveragePrecisionMetric, ibm_watsonx_gov.metrics.ReciprocalRankMetric, ibm_watsonx_gov.metrics.HitRateMetric, ibm_watsonx_gov.metrics.NDCGMetric

Parameters:

func (Optional[Callable], optional) – The tool on which the metric is to be computed.
configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.
metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to MetricGroup.RETRIEVAL_QUALITY.get_metrics().

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

Basic usage

evaluator = AgenticEvaluator()
@evaluator.evaluate_retrieval_quality
def agentic_tool(*args, *kwargs):
    pass

Usage with different thresholds and methods for some of the metrics in the group

metric_1 = NDCGMetric(threshold=MetricThreshold(type="lower_limit", value=0.5))
metric_2 = ContextRelevanceMetric(method="sentence_bert_mini_lm", threshold=MetricThreshold(type="lower_limit", value=0.6))

evaluator = AgenticEvaluator()
@evaluator.evaluate_retrieval_quality(metrics=[metric_1, metric_2])
def agentic_tool(*args, *kwargs):
    pass

evaluate_sexual_content(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) → dict¶

An evaluation decorator for computing sexual content on an agentic tool via granite guardian.

For more details, see ibm_watsonx_gov.metrics.SexualContentMetric

Parameters:

func (Optional[Callable], optional) – The tool on which the metric is to be computed.
configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.
metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to [ ProfanityMetric() ]

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

Create evaluate_sexual_content decorator with default parameters. By default, the metric uses the “input_text” from the graph state as the input.
evaluator = AgenticEvaluator() @evaluator.evaluate_sexual_content def agentic_tool(*args, *kwargs): pass

Create evaluate_sexual_content decorator with thresholds and configuration

metric = SexualContentMetric(thresholds=MetricThreshold(type="lower_limit", value=0.7))
config = {"input_fields": ["input"]}
configuration = AgenticAIConfiguration(**config)
evaluator = AgenticEvaluator()
@evaluator.evaluate_sexual_content(metrics=[metric], configuration=configuration)
def agentic_tool(*args, *kwargs):
    pass

evaluate_social_bias(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) → dict¶

An evaluation decorator for computing social bias on an agentic tool via granite guardian.

For more details, see ibm_watsonx_gov.metrics.SocialBiasMetric

Parameters:

func (Optional[Callable], optional) – The tool on which the metric is to be computed.
configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.
metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to [ HarmMetric() ]

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

Create evaluate_social_bias decorator with default parameters. By default, the metric uses the “input_text” from the graph state as the input.
evaluator = AgenticEvaluator() @evaluator.evaluate_social_bias def agentic_tool(*args, *kwargs): pass

Create evaluate_social_bias decorator with thresholds and configuration

metric = SocialBiasMetric(thresholds=MetricThreshold(type="lower_limit", value=0.7))
config = {"input_fields": ["input"]}
configuration = AgenticAIConfiguration(**config)
evaluator = AgenticEvaluator()
@evaluator.evaluate_social_bias(metrics=[metric], configuration=configuration)
def agentic_tool(*args, *kwargs):
    pass

evaluate_text_grade_level(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) → dict¶

An evaluation decorator for computing text grade level metric on an agentic tool.

For more details, see ibm_watsonx_gov.metrics.TextGradeLevelMetric

Parameters:

func (Optional[Callable], optional) – The tool on which the metric is to be computed.
configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.
metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to [TextGradeLevelMetric()].

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

Basic usage

evaluator = AgenticEvaluator()
@evaluator.evaluate_text_grade_level
def agentic_tool(*args, *kwargs):
    pass

Create evaluate_text_grade_level decorator with thresholds and configuration

metric = TextGradeLevelMetric(thresholds=[MetricThreshold(type="lower_limit", value=6)])
config = {"output_fields": ["generated_text"]}
configuration = AgenticAIConfiguration(**config)
evaluator = AgenticEvaluator()
@evaluator.evaluate_text_grade_level(metrics=[metric], configuration=configuration)
def agentic_tool(*args, *kwargs):
    pass

evaluate_text_reading_ease(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) → dict¶

An evaluation decorator for computing text reading ease ease metric on an agentic tool.

For more details, see ibm_watsonx_gov.metrics.TextReadingEaseMetric

Parameters:

func (Optional[Callable], optional) – The tool on which the metric is to be computed.
configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.
metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to [TextReadingEaseMetric()].

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

Basic usage

evaluator = AgenticEvaluator()
@evaluator.evaluate_text_reading_ease
def agentic_tool(*args, *kwargs):
    pass

Create evaluate_text_reading_ease decorator with thresholds and configuration

metric = TextReadingEaseMetric(thresholds=[MetricThreshold(type="lower_limit", value=70)])
config = {"output_fields": ["generated_text"]}
configuration = AgenticAIConfiguration(**config)
evaluator = AgenticEvaluator()
@evaluator.evaluate_text_reading_ease(metrics=[metric], configuration=configuration)
def agentic_tool(*args, *kwargs):
    pass

evaluate_tool_call_accuracy(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) → dict¶

An evaluation decorator for computing tool_call_accuracy metric on an agent tool.

For more details, see ibm_watsonx_gov.metrics.ToolCallAccuracyMetric

Parameters:

func (Optional[Callable], optional) – The tool on which the metric is to be computed.
configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.
metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to [ ToolCallAccuracyMetric() ].

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

Basic usage

evaluator = AgenticEvaluator()
tool_call_metric_config={
    "tools":[get_weather, fetch_stock_price], # List of tools available to the agent
}
@evaluator.evaluate_tool_call_accuracy(configuration=AgenticAIConfiguration(**tool_call_metric_config))
def agentic_tool(*args, *kwargs):
    pass

Usage with custom tool calls field

evaluator = AgenticEvaluator()
tool_call_metric_config={
    "tools":[get_weather, fetch_stock_price], # List of tools available to the agent
    "tool_calls_field": "tool_calls" # Graph state field to store the Agent's response/tool calls
}
@evaluator.evaluate_tool_call_syntactic_accuracy(configuration=AgenticAIConfiguration(**tool_call_metric_config))
def agentic_tool(*args, *kwargs):
    pass

Usage with different thresholds

metric_1 = ToolCallAccuracyMetric(threshold=MetricThreshold(type="upper_limit", value=0.7))
metric_2 = ToolCallAccuracyMetric(threshold=MetricThreshold(type="upper_limit", value=0.9))
evaluator = AgenticEvaluator()
tool_call_metric_config={
    "tools":[get_weather, fetch_stock_price], # List of tools available to the agent
    "tool_calls_field": "tool_calls" # Graph state field to store the Agent's response/tool calls
}
@evaluator.evaluate_tool_call_accuracy(configuration=AgenticAIConfiguration(**tool_call_metric_config),metrics=[metric_1, metric_2])
def agentic_tool(*args, *kwargs):
    pass

Usage with a list of dictionary items as tools

evaluate_tool_call_parameter_accuracy(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) → dict¶

An evaluation decorator for computing tool_call_parameter_accuracy metric on an agentic tool.

For more details, see ibm_watsonx_gov.metrics.ToolCallParameterAccuracyMetric

Parameters:

func (Optional[Callable], optional) – The tool on which the metric is to be computed.
configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.
metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to [ ToolCallParameterAccuracyMetric() ].

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

Basic usage

evaluator = AgenticEvaluator()
tool_calls_metric_config={
    "tools":[get_weather, fetch_stock_price], # List of tools available to the agent
}
llm_judge = LLMJudge(
    model=WxAIFoundationModel(
        model_id="meta-llama/llama-3-3-70b-instruct",
        project_id=os.getenv("WATSONX_PROJECT_ID"),
    )
)
metric_1 = ToolCallParameterAccuracyMetric(llm_judge=llm_judge)
@evaluator.evaluate_tool_call_parameter_accuracy(configuration=AgenticAIConfiguration(**tool_calls_metric_config), metrics=[metric_1])
def agentic_tool(*args, *kwargs):
    pass

Usage with custom tool calls field

evaluator = AgenticEvaluator()
tool_calls_metric_config={
    "tools":[get_weather, fetch_stock_price], # List of tools available to the agent
    "tool_calls_field": "tool_calls" # Graph state field to store the Agent's response/tool calls
}
llm_judge = LLMJudge(
    model=WxAIFoundationModel(
        model_id="meta-llama/llama-3-3-70b-instruct",
        project_id=os.getenv("WATSONX_PROJECT_ID"),
    )
)
metric_1 = ToolCallParameterAccuracyMetric(llm_judge=llm_judge)
@evaluator.evaluate_tool_call_parameter_accuracy(configuration=AgenticAIConfiguration(**tool_calls_metric_config), metrics=[metric_1])
def agentic_tool(*args, *kwargs):
    pass

Usage with different thresholds

llm_judge = LLMJudge(
    model=WxAIFoundationModel(
        model_id="meta-llama/llama-3-3-70b-instruct",
        project_id=os.getenv("WATSONX_PROJECT_ID"),
    )
)
metric_1 = ToolCallParameterAccuracyMetric(llm_judge=llm_judge, threshold=MetricThreshold(type="upper_limit", value=0.7))
evaluator = AgenticEvaluator()
tool_calls_metric_config={
    "tools":[get_weather, fetch_stock_price], # List of tools available to the agent
    "tool_calls_field": "tool_calls" # Graph state field to store the Agent's response/tool calls
}
@evaluator.evaluate_tool_call_parameter_accuracy(configuration=AgenticAIConfiguration(**tool_calls_metric_config),metrics=[metric_1])
def agentic_tool(*args, *kwargs):
    pass

evaluate_tool_call_relevance(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) → dict¶

An evaluation decorator for computing tool_call_relevance metric on an agent tool.

For more details, see ibm_watsonx_gov.metrics.ToolCallRelevanceMetric

Parameters:

func (Optional[Callable], optional) – The tool on which the metric is to be computed.
configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.
metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to [ ToolCallRelevanceMetric() ].

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

Basic usage

evaluator = AgenticEvaluator()
tool_call_relevance_config={
    "tools":[get_weather, fetch_stock_price], # List of tools available to the agent
}
llm_judge = LLMJudge(
    model=WxAIFoundationModel(
        model_id="meta-llama/llama-3-3-70b-instruct",
        project_id=os.getenv("WATSONX_PROJECT_ID"),
    )
)
metric_1 = ToolCallRelevanceMetric(llm_judge=llm_judge)
@evaluator.evaluate_tool_call_relevance(configuration=AgenticAIConfiguration(**tool_call_relevance_config), metrics=[metric_1])
def agentic_tool(*args, *kwargs):
    pass

Usage with custom tool calls field

evaluator = AgenticEvaluator()
tool_call_relevance_config={
    "tools":[get_weather, fetch_stock_price], # List of tools available to the agent
    "tool_calls_field": "tool_calls" # Graph state field to store the Agent's response/tool calls
}
llm_judge = LLMJudge(
    model=WxAIFoundationModel(
        model_id="meta-llama/llama-3-3-70b-instruct",
        project_id=os.getenv("WATSONX_PROJECT_ID"),
    )
)
metric_1 = ToolCallRelevanceMetric(llm_judge=llm_judge)
@evaluator.evaluate_tool_call_relevance(configuration=AgenticAIConfiguration(**tool_call_relevance_config), metrics=[metric_1])
def agentic_tool(*args, *kwargs):
    pass

Usage with different thresholds

llm_judge = LLMJudge(
    model=WxAIFoundationModel(
        model_id="meta-llama/llama-3-3-70b-instruct",
        project_id=os.getenv("WATSONX_PROJECT_ID"),
    )
)
metric_1 = ToolCallRelevanceMetric(llm_judge=llm_judge, threshold=MetricThreshold(type="upper_limit", value=0.7))
evaluator = AgenticEvaluator()
tool_call_relevance_config={
    "tools":[get_weather, fetch_stock_price], # List of tools available to the agent
    "tool_calls_field": "tool_calls" # Graph state field to store the Agent's response/tool calls
}
@evaluator.evaluate_tool_call_relevance(configuration=AgenticAIConfiguration(**tool_call_relevance_config),metrics=[metric_1])
def agentic_tool(*args, *kwargs):
    pass

evaluate_tool_call_syntactic_accuracy(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) → dict¶

An evaluation decorator for computing tool_call_syntactic_accuracy metric on an agent tool.

For more details, see ibm_watsonx_gov.metrics.ToolCallSyntacticAccuracyMetric

Parameters:

func (Optional[Callable], optional) – The tool on which the metric is to be computed.
configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.
metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to [ ToolCallSyntacticAccuracyMetric() ].

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

Basic usage

evaluator = AgenticEvaluator()
tool_call_syntactic_metric_config={
    "tools":[get_weather, fetch_stock_price], # List of tools available to the agent
}
@evaluator.evaluate_tool_call_syntactic_accuracy(configuration=AgenticAIConfiguration(**tool_call_syntactic_metric_config))
def agentic_tool(*args, *kwargs):
    pass

Usage with custom tool calls field

evaluator = AgenticEvaluator()
tool_call_syntactic_metric_config={
    "tools":[get_weather, fetch_stock_price], # List of tools available to the agent
    "tool_calls_field": "tool_calls" # Graph state field to store the Agent's response/tool calls
}
@evaluator.evaluate_tool_call_syntactic_accuracy(configuration=AgenticAIConfiguration(**tool_call_syntactic_metric_config))
def agentic_tool(*args, *kwargs):
    pass

Usage with different thresholds

metric_1 = ToolCallSyntacticAccuracyMetric(threshold=MetricThreshold(type="upper_limit", value=0.7))
evaluator = AgenticEvaluator()
tool_call_syntactic_metric_config={
    "tools":[get_weather, fetch_stock_price], # List of tools available to the agent
    "tool_calls_field": "tool_calls" # Graph state field to store the Agent's response/tool calls
}
@evaluator.evaluate_tool_call_syntactic_accuracy(configuration=AgenticAIConfiguration(**tool_call_syntactic_metric_config),metrics=[metric_1])
def agentic_tool(*args, *kwargs):
    pass

evaluate_topic_relevance(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric], compute_real_time: bool | None = True) → dict¶

An evaluation decorator for computing topic relevance on an agentic tool via off-topic detector.

For more details, see ibm_watsonx_gov.metrics.TopicRelevanceMetric

Parameters:

func (Optional[Callable], optional) – The tool on which the metric is to be computed.
configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.
metrics (list[GenAIMetric]) – The list of metrics to compute as part of this evaluator.

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

Create evaluate_topic_relevance decorator with default parameters. By default, the metric uses the “input_text” from the graph state as the input.

metric = TopicRelevanceMetric(system_prompt="...")
evaluator = AgenticEvaluator()
@evaluator.evaluate_topic_relevance(metrics=[metric])
def agentic_tool(*args, *kwargs):
    pass

Create evaluate_topic_relevance decorator with thresholds and configuration

metric = TopicRelevanceMetric(system_prompt="...", thresholds=MetricThreshold(type="lower_limit", value=0.7))
evaluator = AgenticEvaluator()
config = {"input_fields": ["input"]}
configuration = AgenticAIConfiguration(**config)
@evaluator.evaluate_topic_relevance(metrics=[metric], configuration=configuration)
def agentic_tool(*args, *kwargs):
    pass

evaluate_unethical_behavior(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) → dict¶

An evaluation decorator for computing unethical behavior on an agentic tool via granite guardian.

For more details, see ibm_watsonx_gov.metrics.UnethicalBehaviorMetric

Parameters:

func (Optional[Callable], optional) – The tool on which the metric is to be computed.
configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.
metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to [ ProfanityMetric() ]

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

Create evaluate_unethical_behavior decorator with default parameters. By default, the metric uses the “input_text” from the graph state as the input.
evaluator = AgenticEvaluator() @evaluator.evaluate_unethical_behavior def agentic_tool(*args, *kwargs): pass

Create evaluate_unethical_behavior decorator with thresholds and configuration

metric = UnethicalBehaviorMetric(thresholds=MetricThreshold(type="lower_limit", value=0.7))
config = {"input_fields": ["input"]}
configuration = AgenticAIConfiguration(**config)
evaluator = AgenticEvaluator()
@evaluator.evaluate_unethical_behavior(metrics=[metric], configuration=configuration)
def agentic_tool(*args, *kwargs):
    pass

evaluate_unsuccessful_requests(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) → dict¶

An evaluation decorator for computing unsuccessful requests metric on an agentic tool.

For more details, see ibm_watsonx_gov.metrics.UnsuccessfulRequestsMetric

Parameters:

func (Optional[Callable], optional) – The tool on which the metric is to be computed.
configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.
metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to [ UnsuccessfulRequestsMetric() ].

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

Basic usage

evaluator = AgenticEvaluator()
@evaluator.evaluate_unsuccessful_requests
def agentic_tool(*args, *kwargs):
    pass

Usage with different thresholds and methods

metric_1 = UnsuccessfulRequestsMetric(threshold=MetricThreshold(type="lower_limit", value=0.5))

evaluator = AgenticEvaluator()
@evaluator.evaluate_unsuccessful_requests(metrics=[metric_1])
def agentic_tool(*args, *kwargs):
    pass

evaluate_violence(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) → dict¶

An evaluation decorator for computing violence on an agentic tool via granite guardian.

For more details, see ibm_watsonx_gov.metrics.ViolenceMetric

Parameters:

func (Optional[Callable], optional) – The tool on which the metric is to be computed.
configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.
metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to [ ProfanityMetric() ]

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

Create evaluate_violence decorator with default parameters. By default, the metric uses the “input_text” from the graph state as the input.
evaluator = AgenticEvaluator() @evaluator.evaluate_violence def agentic_tool(*args, *kwargs): pass

Create evaluate_violence decorator with thresholds and configuration

metric = ViolenceMetric(thresholds=MetricThreshold(type="lower_limit", value=0.7))
config = {"input_fields": ["input"]}
configuration = AgenticAIConfiguration(**config)
evaluator = AgenticEvaluator()
@evaluator.evaluate_violence(metrics=[metric], configuration=configuration)
def agentic_tool(*args, *kwargs):
    pass

get_metric_result(metric_name: str, node_name: str) → AgentMetricResult¶

Get the AgentMetricResult for the given metric and node name. This is used to get the result of the metric computed during agent execution.

Parameters:

metric_name (string) – The metric name
node_name (string) – The node name

Returns:

The AgentMetricResult object for the metric.

Return type:

agent_metric_result (AgentMetricResult)

get_nodes() → list[Node]¶

Get the list of nodes used in the agentic application

Returns:: The list of nodes used in the agentic application
Return type:: nodes (list[Node])

get_result(run_name: str | None = None) → AgenticEvaluationResult¶

Get the AgenticEvaluationResult for the run. By default the result for the latest run is returned. Specify the run name to get the result for a specific run. :param run_name: The evaluation run name :type run_name: string

Returns:: The AgenticEvaluationResult object for the run.
Return type:: agentic_evaluation_result (AgenticEvaluationResult)

log_custom_metrics(custom_metrics)¶

Collect the custom metrics provided by user and append with metrics of current run.

Parameters:: custom_metrics (List[Dict]) – custom metrics

model_post_init(context: Any, /) → None¶

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:

self – The BaseModel instance.
context – The context.

start_run(run_request: AIExperimentRunRequest = AIExperimentRunRequest(name='run_1', description='', source_name='', source_url='', custom_tags=[], agent_method_name='')) → AIExperimentRun¶

Start a run to track the metrics computation within an experiment. This method is required to be called before any metrics computation.

Parameters:: run_request (AIExperimentRunRequest) – The run_request instance containing name, source_name, source_url, custom_tags
Returns:: The details of experiment run like id, name, description etc.

track_experiment(name: str = 'experiment_1', description: str = None, use_existing: bool = True) → str¶

Start tracking an experiment for the metrics evaluation. The experiment will be created if it doesn’t exist. If an existing experiment with the same name is found, it will be reused based on the flag use_existing.

Parameters:

project_id (string) – The project id to store the experiment.
name (string) – The name of the experiment.
description (str) – The description of the experiment.
use_existing (bool) – The flag to specify if the experiment should be reused if an existing experiment with the given name is found.

Returns:

The ID of AI experiment asset