Agentic Evaluator

pydantic model ibm_watsonx_gov.evaluators.agentic_evaluator.AgenticEvaluator

Bases: BaseEvaluator

The class to evaluate agentic application.

Examples

  1. Evaluate Agent with default parameters. This will compute only the performance(latency, duration) and usage(cost, input_token_count, output_token_count) metrics.
    agentic_evaluator = AgenticEvaluator()
    agentic_evaluator.start_run()
    # Invoke the agentic application
    agentic_evaluator.end_run()
    result = agentic_evaluator.get_result()
    
  2. Evaluate Agent by specifying the agent or message level metrics and the node level metrics which will be computed post graph invocation when end_run() is called.
    # Below example provides the node configuration to compute the ContextRelevanceMetric and all the Retrieval Quality group metrics.
    nodes = [Node(name="Retrieval Node",
                metrics_configurations=[MetricsConfiguration(metrics=[ContextRelevanceMetric()],
                                                             metric_groups=[MetricGroup.RETRIEVAL_QUALITY])])]
    # Please refer to MetricsConfiguration class for advanced usage where the fields details can be specified, in case the graph state has the attributes with non default names.
    
    # Below example provides the agent configuration to compute the AnswerRelevanceMetric and all the Content Safety group metrics on agent or message level.
    agentic_app = AgenticApp(name="Agentic App",
                        metrics_configuration=MetricsConfiguration(metrics=[AnswerRelevanceMetric()],
                                                                    metric_groups=[MetricGroup.CONTENT_SAFETY]),
                        nodes=nodes)
    
    agentic_evaluator = AgenticEvaluator(agentic_app=agentic_app)
    agentic_evaluator.start_run()
    # Invoke the agentic application
    agentic_evaluator.end_run()
    result = agentic_evaluator.get_result()
    
  3. Evaluate Agent by specifying the agent or message level metrics and use decorator to compute node level metrics which will be computed during graph invocation.
    # Below example provides the agent configuration to compute the AnswerRelevanceMetric and all the Content Safety group metrics on agent or message level.
    # Agent or message level metrics will be computed post graph invocation when end_run() is called.
    agentic_app = AgenticApp(name="Agentic App",
                        metrics_configuration=MetricsConfiguration(metrics=[AnswerRelevanceMetric()],
                                                                    metric_groups=[MetricGroup.CONTENT_SAFETY]))
    
    agentic_evaluator = AgenticEvaluator(agentic_app=agentic_app)
    
    # Add decorator when defining the node functions
    @evaluator.evaluate_retrieval_quality(configuration=AgenticAIConfiguration(**{"input_fields": ["input_text"], "context_fields": ["local_context"]}))
    @evaluator.evaluate_content_safety() # Here the default AgenticAIConfiguration is used
    def local_search_node(state: GraphState, config: RunnableConfig) -> dict:
        # Retrieve data from vector db
        # ...
        return {"local_context": []}
    
    agentic_evaluator.start_run()
    # Invoke the agentic application
    agentic_evaluator.end_run()
    result = agentic_evaluator.get_result()
    
  4. Evaluate agent with experiment tracking
    tracing_config = TracingConfiguration(project_id=project_id)
    agentic_evaluator = AgenticEvaluator(tracing_configuration=tracing_config)
    
    agentic_evaluator.track_experiment(name="my_experiment")
    agentic_evaluator.start_run(AIExperimentRunRequest(name="run1"))
    # Invoke the agentic application
    agentic_evaluator.end_run()
    result = agentic_evaluator.get_result()
    

Show JSON schema
{
   "title": "AgenticEvaluator",
   "type": "object",
   "properties": {
      "api_client": {
         "default": null,
         "title": "Api Client"
      },
      "agentic_app": {
         "anyOf": [
            {
               "$ref": "#/$defs/AgenticApp"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The agentic application configuration details.",
         "title": "Agentic application configuration details"
      },
      "tracing_configuration": {
         "anyOf": [
            {
               "$ref": "#/$defs/TracingConfiguration"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "The tracing configuration details.",
         "title": "Tracing Configuration"
      },
      "ai_experiment_client": {
         "default": null,
         "title": "Ai Experiment Client"
      },
      "max_concurrency": {
         "default": 10,
         "description": "The maximum concurrency to use for evaluating metrics.",
         "title": "Max Concurrency",
         "type": "integer"
      }
   },
   "$defs": {
      "AWSBedrockCredentials": {
         "description": "Defines the AWSBedrockCredentials class for accessing AWS Bedrock using environment variables or manual input.\n\nExamples:\n    1. Create credentials manually:\n        .. code-block:: python\n\n            credentials = AWSBedrockCredentials(\n                aws_access_key_id=\"your-access-key-id\",\n                aws_secret_access_key=\"your-secret-access-key\",\n                aws_region_name=\"us-east-1\",\n                aws_session_token=\"optional-session-token\"\n            )\n\n    2. Create credentials from environment:\n        .. code-block:: python\n\n            os.environ[\"AWS_ACCESS_KEY_ID\"] = \"your-access-key-id\"\n            os.environ[\"AWS_DEFAULT_REGION\"] = \"us-east-1\"\n            os.environ[\"AWS_SECRET_ACCESS_KEY\"] = \"your-secret-access-key\"\n\n            credentials = AWSBedrockCredentials.create_from_env()",
         "properties": {
            "aws_access_key_id": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "description": "The AWS access key id. This attribute value will be read from AWS_ACCESS_KEY_ID environment variable when creating AWSBedrockCredentials from environment.",
               "title": "AWS Access Key ID"
            },
            "aws_secret_access_key": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "description": "The AWS secret access key. This attribute value will be read from AWS_SECRET_ACCESS_KEY environment variable when creating AWSBedrockCredentials from environment.",
               "title": "AWS Secret Access Key"
            },
            "aws_region_name": {
               "default": "us-east-1",
               "description": "AWS region. This attribute value will be read from AWS_DEFAULT_REGION environment variable when creating AWSBedrockCredentials from environment.",
               "examples": [
                  "us-east-1",
                  "eu-west-1"
               ],
               "title": "AWS Region",
               "type": "string"
            },
            "aws_session_token": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "description": "Optional AWS session token for temporary credentials.",
               "title": "AWS Session Token"
            }
         },
         "required": [
            "aws_access_key_id",
            "aws_secret_access_key",
            "aws_session_token"
         ],
         "title": "AWSBedrockCredentials",
         "type": "object"
      },
      "AWSBedrockFoundationModel": {
         "description": "    The Amazon Bedrock foundation model details.\n\n    Examples:\n        1. Create AWS Bedrock foundation model by passing credentials manually:\n            .. code-block:: python\n\n                bedrock_model = AWSBedrockFoundationModel(\n                    model_id=\"anthropic.claude-v2\",\n                    provider=AWSBedrockModelProvider(\n                        credentials=AWSBedrockCredentials(\n                            aws_access_key_id=\"your-access-key-id\",\n                            aws_secret_access_key=\"your-secret-access-key\",\n                            aws_region_name=\"us-east-1\",\n                            aws_session_token=\"optional-session-token\"\n                        )\n                    ),\n                    parameters={\n                        \"temperature\": 0.7,\n                        \"top_p\": 0.9,\n                        \"max_tokens\": 200,\n                        \"stop_sequences\": [\"\n\"],\n                        \"system\": \"You are a concise assistant.\",\n                        \"reasoning_effort\": \"high\",\n                        \"tool_choice\": \"auto\"\n                    }\n                )\n\n        2. Create AWS Bedrock foundation model using environment variables:\n            os.environ[\"AWS_ACCESS_KEY_ID\"] = \"your-access-key-id\"\n            os.environ[\"AWS_SECRET_ACCESS_KEY\"] = \"your-secret-access-key\"\n            os.environ[\"AWS_DEFAULT_REGION\"] = \"us-east-1\"\n\n            .. code-block:: python\n\n                bedrock_model = AWSBedrockFoundationModel(\n                    model_id=\"anthropic.claude-v2\"\n                )\n    ",
         "properties": {
            "model_id": {
               "description": "The AWS Bedrock model name. It must be a valid AWS Bedrock model identifier.",
               "examples": [
                  "anthropic.claude-v2"
               ],
               "title": "Model ID",
               "type": "string"
            },
            "provider": {
               "$ref": "#/$defs/AWSBedrockModelProvider",
               "description": "The AWS Bedrock provider details.",
               "title": "Provider"
            },
            "parameters": {
               "anyOf": [
                  {
                     "type": "object"
                  },
                  {
                     "type": "null"
                  }
               ],
               "description": "The model parameters to be used when invoking the model. The parameters may include temperature, top_p, max_tokens, etc..",
               "title": "Parameters"
            }
         },
         "required": [
            "model_id"
         ],
         "title": "AWSBedrockFoundationModel",
         "type": "object"
      },
      "AWSBedrockModelProvider": {
         "description": "Represents a model provider using Amazon Bedrock.\n\nExamples:\n    1. Create provider using credentials object:\n        .. code-block:: python\n\n            provider = AWSBedrockModelProvider(\n                credentials=AWSBedrockCredentials(\n                    aws_access_key_id=\"your-access-key-id\",\n                    aws_secret_access_key=\"your-secret-access-key\",\n                    aws_region_name=\"us-east-1\",\n                    aws_session_token=\"optional-session-token\"\n                )\n            )\n\n    2. Create provider using environment variables:\n        .. code-block:: python\n\n            os.environ['AWS_ACCESS_KEY_ID'] = \"your-access-key-id\"\n            os.environ['AWS_SECRET_ACCESS_KEY'] = \"your-secret-access-key\"\n            os.environ['AWS_SESSION_TOKEN'] = \"optional-session-token\"  # Optional\n            os.environ['AWS_DEFAULT_REGION'] = \"us-east-1\"\n            provider = AWSBedrockModelProvider()",
         "properties": {
            "type": {
               "$ref": "#/$defs/ModelProviderType",
               "default": "aws_bedrock",
               "description": "The type of model provider."
            },
            "credentials": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/AWSBedrockCredentials"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "AWS Bedrock credentials."
            }
         },
         "title": "AWSBedrockModelProvider",
         "type": "object"
      },
      "AgenticAIConfiguration": {
         "description": "Defines the AgenticAIConfiguration class.\n\nThe configuration interface for Agentic AI tools and applications.\nThis is used to specify the fields mapping details in the data and other configuration parameters needed for evaluation.\n\nExamples:\n    1. Create configuration with default parameters\n        .. code-block:: python\n\n            configuration = AgenticAIConfiguration()\n\n    2. Create configuration with parameters\n        .. code-block:: python\n\n            configuration = AgenticAIConfiguration(input_fields=[\"input\"], \n                                                   output_fields=[\"output\"])\n\n    2. Create configuration with dict parameters\n        .. code-block:: python\n\n            config = {\"input_fields\": [\"input\"],\n                      \"output_fields\": [\"output\"],\n                      \"context_fields\": [\"contexts\"],\n                      \"reference_fields\": [\"reference\"]}\n            configuration = AgenticAIConfiguration(**config)",
         "properties": {
            "record_id_field": {
               "default": "record_id",
               "description": "The record identifier field name.",
               "examples": [
                  "record_id"
               ],
               "title": "Record id field",
               "type": "string"
            },
            "record_timestamp_field": {
               "default": "record_timestamp",
               "description": "The record timestamp field name.",
               "examples": [
                  "record_timestamp"
               ],
               "title": "Record timestamp field",
               "type": "string"
            },
            "task_type": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/TaskType"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The generative task type. Default value is None.",
               "examples": [
                  "retrieval_augmented_generation"
               ],
               "title": "Task Type"
            },
            "input_fields": {
               "default": [
                  "input_text"
               ],
               "description": "The list of model input fields in the data. Default value is ['input_text'].",
               "examples": [
                  [
                     "question"
                  ]
               ],
               "items": {
                  "type": "string"
               },
               "title": "Input Fields",
               "type": "array"
            },
            "context_fields": {
               "default": [
                  "context"
               ],
               "description": "The list of context fields in the input fields. Default value is ['context'].",
               "examples": [
                  [
                     "context1",
                     "context2"
                  ]
               ],
               "items": {
                  "type": "string"
               },
               "title": "Context Fields",
               "type": "array"
            },
            "output_fields": {
               "default": [
                  "generated_text"
               ],
               "description": "The list of model output fields in the data. Default value is ['generated_text'].",
               "examples": [
                  [
                     "output"
                  ]
               ],
               "items": {
                  "type": "string"
               },
               "title": "Output Fields",
               "type": "array"
            },
            "reference_fields": {
               "default": [
                  "ground_truth"
               ],
               "description": "The list of reference fields in the data. Default value is ['ground_truth'].",
               "examples": [
                  [
                     "reference"
                  ]
               ],
               "items": {
                  "type": "string"
               },
               "title": "Reference Fields",
               "type": "array"
            },
            "locale": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/Locale"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The language locale of the input, output and reference fields in the data.",
               "title": "Locale"
            },
            "tools": {
               "default": [],
               "description": "The list of tools used by the LLM.",
               "examples": [
                  [
                     "function1",
                     "function2"
                  ]
               ],
               "items": {
                  "type": "object"
               },
               "title": "Tools",
               "type": "array"
            },
            "tool_calls_field": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": "tool_calls",
               "description": "The tool calls field in the input fields. Default value is 'tool_calls'.",
               "examples": [
                  "tool_calls"
               ],
               "title": "Tool Calls Field"
            },
            "available_tools_field": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": "available_tools",
               "description": "The tool inventory field in the data. Default value is 'available_tools'.",
               "examples": [
                  "available_tools"
               ],
               "title": "Available Tools Field"
            },
            "llm_judge": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/LLMJudge"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "LLM as Judge Model details.",
               "title": "LLM Judge"
            },
            "prompt_field": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": "model_prompt",
               "description": "The prompt field in the input fields. Default value is 'model_prompt'.",
               "examples": [
                  "model_prompt"
               ],
               "title": "Model Prompt Field"
            },
            "message_id_field": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": "message_id",
               "description": "The message identifier field name. Default value is 'message_id'.",
               "examples": [
                  "message_id"
               ],
               "title": "Message id field"
            },
            "conversation_id_field": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": "conversation_id",
               "description": "The conversation identifier field name. Default value is 'conversation_id'.",
               "examples": [
                  "conversation_id"
               ],
               "title": "Conversation id field"
            }
         },
         "title": "AgenticAIConfiguration",
         "type": "object"
      },
      "AgenticApp": {
         "description": "The configuration class representing an agentic application.\nAn agent is composed of a set of nodes.\nThe metrics to be computed at the agent or message level should be specified in the metrics_configuration and the metrics to be computed for the node level should be specified in the nodes list.\n\nExamples:\n    1. Create AgenticApp with agent level metrics configuration. \n        .. code-block:: python\n\n            # Below example provides the agent configuration to compute the AnswerRelevanceMetric and all the metrics in Content Safety group on agent or message level.\n            agentic_app = AgenticApp(name=\"Agentic App\",\n                                metrics_configuration=MetricsConfiguration(metrics=[AnswerRelevanceMetric()],\n                                                                            metric_groups=[MetricGroup.CONTENT_SAFETY]))\n            agentic_evaluator = AgenticEvaluator(agentic_app=agentic_app)\n            ...\n\n    2. Create AgenticApp with agent and node level metrics configuration and default agentic ai configuration for metrics. \n        .. code-block:: python\n\n            # Below example provides the node configuration to compute the ContextRelevanceMetric and all the metrics in Retrieval Quality group. \n            nodes = [Node(name=\"Retrieval Node\",\n                        metrics_configurations=[MetricsConfiguration(metrics=[ContextRelevanceMetric()],\n                                                                     metric_groups=[MetricGroup.RETRIEVAL_QUALITY])])]\n\n            # Below example provides the agent configuration to compute the AnswerRelevanceMetric and all the metrics in Content Safety group on agent or message level.\n            agentic_app = AgenticApp(name=\"Agentic App\",\n                                metrics_configuration=MetricsConfiguration(metrics=[AnswerRelevanceMetric()],\n                                                                            metric_groups=[MetricGroup.CONTENT_SAFETY]),\n                                nodes=nodes)\n            agentic_evaluator = AgenticEvaluator(agentic_app=agentic_app)\n            ...\n\n    3. Create AgenticApp with agent and nodel level metrics configuration and with agentic ai configuration for metrics. \n        .. code-block:: python\n\n            # Below example provides the node configuration to compute the ContextRelevanceMetric and all the metrics in Retrieval Quality group.\n            node_fields_config = {\n                \"input_fields\": [\"input\"],\n                \"context_fields\": [\"web_context\"]\n            }\n            nodes = [Node(name=\"Retrieval Node\",\n                        metrics_configurations=[MetricsConfiguration(configuration=AgenticAIConfiguration(**node_fields_config)\n                                                                     metrics=[ContextRelevanceMetric()],\n                                                                     metric_groups=[MetricGroup.RETRIEVAL_QUALITY])])]\n\n            # Below example provides the agent configuration to compute the AnswerRelevanceMetric and all the metrics in Content Safety group on agent or message level.\n            agent_fields_config = {\n                \"input_fields\": [\"input\"],\n                \"output_fields\": [\"output\"]\n            }\n            agentic_app = AgenticApp(name=\"Agentic App\",\n                                metrics_configuration=MetricsConfiguration(configuration=AgenticAIConfiguration(**agent_fields_config)\n                                                                           metrics=[AnswerRelevanceMetric()],\n                                                                           metric_groups=[MetricGroup.CONTENT_SAFETY]),\n                                nodes=nodes)\n            agentic_evaluator = AgenticEvaluator(agentic_app=agentic_app)\n            ...",
         "properties": {
            "name": {
               "default": "Agentic App",
               "description": "The name of the agentic application.",
               "title": "Agentic application name",
               "type": "string"
            },
            "metrics_configuration": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/MetricsConfiguration"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": {
                  "configuration": {
                     "available_tools_field": "available_tools",
                     "context_fields": [
                        "context"
                     ],
                     "conversation_id_field": "conversation_id",
                     "input_fields": [
                        "input_text"
                     ],
                     "llm_judge": null,
                     "locale": null,
                     "message_id_field": "message_id",
                     "output_fields": [
                        "generated_text"
                     ],
                     "prompt_field": "model_prompt",
                     "record_id_field": "record_id",
                     "record_timestamp_field": "record_timestamp",
                     "reference_fields": [
                        "ground_truth"
                     ],
                     "task_type": null,
                     "tool_calls_field": "tool_calls",
                     "tools": []
                  },
                  "metrics": [],
                  "metric_groups": []
               },
               "description": "The list of metrics to be computed on the agentic application and their configuration details.",
               "title": "Metrics configuration"
            },
            "nodes": {
               "anyOf": [
                  {
                     "items": {
                        "$ref": "#/$defs/Node"
                     },
                     "type": "array"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": [],
               "description": "The nodes details.",
               "title": "Node details"
            }
         },
         "title": "AgenticApp",
         "type": "object"
      },
      "AzureOpenAICredentials": {
         "properties": {
            "url": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "description": "Azure OpenAI url. This attribute can be read from `AZURE_OPENAI_HOST` environment variable.",
               "title": "Url"
            },
            "api_key": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "description": "API key for Azure OpenAI. This attribute can be read from `AZURE_OPENAI_API_KEY` environment variable.",
               "title": "Api Key"
            },
            "api_version": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "description": "The model API version from Azure OpenAI. This attribute can be read from `AZURE_OPENAI_API_VERSION` environment variable.",
               "title": "Api Version"
            }
         },
         "required": [
            "url",
            "api_key",
            "api_version"
         ],
         "title": "AzureOpenAICredentials",
         "type": "object"
      },
      "AzureOpenAIFoundationModel": {
         "description": "The Azure OpenAI foundation model details\n\nExamples:\n    1. Create Azure OpenAI foundation model by passing the credentials during object creation.\n        .. code-block:: python\n\n            azure_openai_foundation_model = AzureOpenAIFoundationModel(\n                model_id=\"gpt-4o-mini\",\n                provider=AzureOpenAIModelProvider(\n                    credentials=AzureOpenAICredentials(\n                        api_key=azure_api_key,\n                        url=azure_host_url,\n                        api_version=azure_api_model_version,\n                    )\n                )\n            )\n\n2. Create Azure OpenAI foundation model by setting the credentials in environment variables:\n    * ``AZURE_OPENAI_API_KEY`` is used to set the api key for OpenAI.\n    * ``AZURE_OPENAI_HOST`` is used to set the url for Azure OpenAI.\n    * ``AZURE_OPENAI_API_VERSION`` is uses to set the the api version for Azure OpenAI.\n\n        .. code-block:: python\n\n            openai_foundation_model = AzureOpenAIFoundationModel(\n                model_id=\"gpt-4o-mini\",\n            )",
         "properties": {
            "model_name": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The name of the foundation model.",
               "title": "Model Name"
            },
            "provider": {
               "$ref": "#/$defs/AzureOpenAIModelProvider",
               "description": "Azure OpenAI provider"
            },
            "model_id": {
               "description": "Model deployment name from Azure OpenAI",
               "title": "Model Id",
               "type": "string"
            }
         },
         "required": [
            "model_id"
         ],
         "title": "AzureOpenAIFoundationModel",
         "type": "object"
      },
      "AzureOpenAIModelProvider": {
         "properties": {
            "type": {
               "$ref": "#/$defs/ModelProviderType",
               "default": "azure_openai",
               "description": "The type of model provider."
            },
            "credentials": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/AzureOpenAICredentials"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Azure OpenAI credentials."
            }
         },
         "title": "AzureOpenAIModelProvider",
         "type": "object"
      },
      "CustomFoundationModel": {
         "description": "Defines the CustomFoundationModel class.\n\nThis class extends the base `FoundationModel` to support custom inference logic through a user-defined scoring function.\nIt is intended for use cases where the model is externally hosted and not in the list of supported frameworks.\nExamples:\n    1. Define a custom scoring function and create a model:\n        .. code-block:: python\n\n            import pandas as pd\n\n            def scoring_fn(data: pd.DataFrame):\n                predictions_list = []\n                # Custom logic to call an external LLM\n                return pd.DataFrame({\"generated_text\": predictions_list})                    \n\n            model = CustomFoundationModel(\n                scoring_fn=scoring_fn\n            )",
         "properties": {
            "model_name": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The name of the foundation model.",
               "title": "Model Name"
            },
            "provider": {
               "$ref": "#/$defs/ModelProvider",
               "description": "The provider of the model."
            }
         },
         "title": "CustomFoundationModel",
         "type": "object"
      },
      "FoundationModelInfo": {
         "description": "Represents a foundation model used in an experiment.",
         "properties": {
            "model_name": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The name of the foundation model.",
               "title": "Model Name"
            },
            "model_id": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The id of the foundation model.",
               "title": "Model Id"
            },
            "provider": {
               "description": "The provider of the foundation model.",
               "title": "Provider",
               "type": "string"
            },
            "type": {
               "description": "The type of foundation model.",
               "example": [
                  "chat",
                  "embedding",
                  "text-generation"
               ],
               "title": "Type",
               "type": "string"
            }
         },
         "required": [
            "provider",
            "type"
         ],
         "title": "FoundationModelInfo",
         "type": "object"
      },
      "GenAIMetric": {
         "description": "Defines the Generative AI metric interface",
         "properties": {
            "name": {
               "description": "The name of the metric.",
               "examples": [
                  "answer_relevance",
                  "context_relevance"
               ],
               "title": "Metric Name",
               "type": "string"
            },
            "display_name": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The display name of the metric.",
               "examples": [
                  "Answer Relevance",
                  "Context Relevance"
               ],
               "title": "Metric display name"
            },
            "type_": {
               "default": "ootb",
               "description": "The type of the metric. Indicates whether the metric is ootb or custom.",
               "examples": [
                  "ootb",
                  "custom"
               ],
               "title": "Metric type",
               "type": "string"
            },
            "value_type": {
               "default": "numeric",
               "description": "The type of the metric value. Indicates whether the metric value is numeric or categorical.",
               "examples": [
                  "numeric",
                  "categorical"
               ],
               "title": "Metric value type",
               "type": "string"
            },
            "thresholds": {
               "default": [],
               "description": "The list of thresholds",
               "items": {
                  "$ref": "#/$defs/MetricThreshold"
               },
               "title": "Thresholds",
               "type": "array"
            },
            "tasks": {
               "default": [],
               "description": "The task types this metric is associated with.",
               "items": {
                  "$ref": "#/$defs/TaskType"
               },
               "title": "Tasks",
               "type": "array"
            },
            "group": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/MetricGroup"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The metric group this metric belongs to."
            },
            "is_reference_free": {
               "default": true,
               "description": "Decides whether this metric needs a reference for computation",
               "title": "Is Reference Free",
               "type": "boolean"
            },
            "method": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The method used to compute the metric.",
               "title": "Method"
            },
            "metric_dependencies": {
               "default": [],
               "description": "Metrics that needs to be evaluated first",
               "items": {
                  "$ref": "#/$defs/GenAIMetric"
               },
               "title": "Metric Dependencies",
               "type": "array"
            },
            "applies_to": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": "message",
               "description": "The tag to indicate for which the metric is applied to. Used for agentic application metric computation.",
               "examples": [
                  "message",
                  "conversation",
                  "sub_agent"
               ],
               "title": "Applies to"
            },
            "mapping": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/Mapping"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The data mapping details for the metric which are used to read the values needed to compute the metric.",
               "examples": {
                  "items": [
                     {
                        "attribute_name": "traceloop.entity.input",
                        "column_name": null,
                        "json_path": "$.inputs.input_text",
                        "lookup_child_spans": false,
                        "name": "input_text",
                        "span_name": "LangGraph.workflow",
                        "type": "input"
                     },
                     {
                        "attribute_name": "traceloop.entity.output",
                        "column_name": null,
                        "json_path": "$.outputs.generated_text",
                        "lookup_child_spans": false,
                        "name": "generated_text",
                        "span_name": "LangGraph.workflow",
                        "type": "output"
                     }
                  ],
                  "source": "trace"
               },
               "title": "Mapping"
            }
         },
         "required": [
            "name"
         ],
         "title": "GenAIMetric",
         "type": "object"
      },
      "GoogleAIStudioCredentials": {
         "description": "Defines the GoogleAIStudioCredentials class for accessing Google AI Studio using an API key.\n\nExamples:\n    1. Create credentials manually:\n        .. code-block:: python\n\n            google_credentials = GoogleAIStudioCredentials(api_key=\"your-api-key\")\n\n    2. Create credentials from environment:\n        .. code-block:: python\n\n            os.environ[\"GOOGLE_API_KEY\"] = \"your-api-key\"\n            google_credentials = GoogleAIStudioCredentials.create_from_env()",
         "properties": {
            "api_key": {
               "description": "The Google AI Studio key. This attribute can be read from GOOGLE_API_KEY environment variable when creating GoogleAIStudioCredentials from environment.",
               "title": "Api Key",
               "type": "string"
            }
         },
         "required": [
            "api_key"
         ],
         "title": "GoogleAIStudioCredentials",
         "type": "object"
      },
      "GoogleAIStudioFoundationModel": {
         "description": "Represents a foundation model served via Google AI Studio.\n\nExamples:\n    1. Create Google AI Studio foundation model by passing the credentials during object creation.\n        .. code-block:: python\n\n            model = GoogleAIStudioFoundationModel(\n                model_id=\"gemini-1.5-pro-002\",\n                provider=GoogleAIStudioModelProvider(\n                    credentials=GoogleAIStudioCredentials(api_key=\"your_api_key\")\n                )\n            )\n    2. Create Google AI Studio foundation model by setting the credentials in environment variables:\n        * ``GOOGLE_API_KEY`` OR ``GEMINI_API_KEY`` is used to set the Credentials path for Vertex AI.\n            .. code-block:: python\n\n                model = GoogleAIStudioFoundationModel(\n                    model_id=\"gemini/gpt-4o-mini\",\n                )",
         "properties": {
            "model_name": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The name of the foundation model.",
               "title": "Model Name"
            },
            "provider": {
               "$ref": "#/$defs/GoogleAIStudioModelProvider",
               "description": "Google AI Studio provider.",
               "title": "Provider"
            },
            "model_id": {
               "description": "Model name for Google AI Studio. Must be a valid Google AI model identifier or a fully-qualified publisher path",
               "examples": [
                  "gemini-1.5-pro-002"
               ],
               "title": "Model id",
               "type": "string"
            }
         },
         "required": [
            "model_id"
         ],
         "title": "GoogleAIStudioFoundationModel",
         "type": "object"
      },
      "GoogleAIStudioModelProvider": {
         "description": "Represents a model provider using Google AI Studio.\n\nExamples:\n    1. Create provider using credentials object:\n        .. code-block:: python\n\n            provider = GoogleAIStudioModelProvider(\n                credentials=GoogleAIStudioCredentials(api_key=\"api-key\")\n            )\n\n    2. Create provider using environment variables:\n        .. code-block:: python\n\n            os.environ['GOOGLE_API_KEY'] = \"your_api_key\"\n\n            provider = GoogleAIStudioModelProvider()",
         "properties": {
            "type": {
               "$ref": "#/$defs/ModelProviderType",
               "default": "google_ai_studio",
               "description": "The type of model provider."
            },
            "credentials": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/GoogleAIStudioCredentials"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Google AI Studio credentials."
            }
         },
         "title": "GoogleAIStudioModelProvider",
         "type": "object"
      },
      "LLMJudge": {
         "description": "Defines the LLMJudge.\n\nThe LLMJudge class contains the details of the llm judge model to be used for computing the metric.\n\nExamples:\n    1. Create LLMJudge using watsonx.ai foundation model:\n        .. code-block:: python\n\n            wx_ai_foundation_model = WxAIFoundationModel(\n                model_id=\"ibm/granite-3-3-8b-instruct\",\n                project_id=PROJECT_ID,\n                provider=WxAIModelProvider(\n                    credentials=WxAICredentials(api_key=wx_apikey)\n                )\n            )\n            llm_judge = LLMJudge(model=wx_ai_foundation_model)",
         "properties": {
            "model": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/WxAIFoundationModel"
                  },
                  {
                     "$ref": "#/$defs/OpenAIFoundationModel"
                  },
                  {
                     "$ref": "#/$defs/AzureOpenAIFoundationModel"
                  },
                  {
                     "$ref": "#/$defs/PortKeyGateway"
                  },
                  {
                     "$ref": "#/$defs/RITSFoundationModel"
                  },
                  {
                     "$ref": "#/$defs/VertexAIFoundationModel"
                  },
                  {
                     "$ref": "#/$defs/GoogleAIStudioFoundationModel"
                  },
                  {
                     "$ref": "#/$defs/AWSBedrockFoundationModel"
                  },
                  {
                     "$ref": "#/$defs/CustomFoundationModel"
                  }
               ],
               "description": "The foundation model to be used as judge",
               "title": "Model"
            }
         },
         "required": [
            "model"
         ],
         "title": "LLMJudge",
         "type": "object"
      },
      "Locale": {
         "properties": {
            "input": {
               "anyOf": [
                  {
                     "items": {
                        "type": "string"
                     },
                     "type": "array"
                  },
                  {
                     "additionalProperties": {
                        "type": "string"
                     },
                     "type": "object"
                  },
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Input"
            },
            "output": {
               "anyOf": [
                  {
                     "items": {
                        "type": "string"
                     },
                     "type": "array"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Output"
            },
            "reference": {
               "anyOf": [
                  {
                     "items": {
                        "type": "string"
                     },
                     "type": "array"
                  },
                  {
                     "additionalProperties": {
                        "type": "string"
                     },
                     "type": "object"
                  },
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Reference"
            }
         },
         "title": "Locale",
         "type": "object"
      },
      "Mapping": {
         "description": "Defines the field mapping details to be used for computing a metric.",
         "properties": {
            "source": {
               "default": "trace",
               "description": "The source type of the data. Use trace if the data should be read from span in trace. Use tabular if the data is passed as a dataframe.",
               "enum": [
                  "trace",
                  "tabular"
               ],
               "examples": [
                  "trace",
                  "tabular"
               ],
               "title": "Source",
               "type": "string"
            },
            "items": {
               "description": "The list of mapping items for the field. They are used to read the data from trace or tabular data for computing the metric.",
               "items": {
                  "$ref": "#/$defs/MappingItem"
               },
               "title": "Mapping Items",
               "type": "array"
            }
         },
         "required": [
            "items"
         ],
         "title": "Mapping",
         "type": "object"
      },
      "MappingItem": {
         "description": "The mapping details to be used for reading the values from the data.",
         "properties": {
            "name": {
               "description": "The name of the item.",
               "examples": [
                  "input_text",
                  "generated_text",
                  "context",
                  "ground_truth"
               ],
               "title": "Name",
               "type": "string"
            },
            "type": {
               "description": "The type of the item.",
               "enum": [
                  "input",
                  "output",
                  "reference",
                  "context",
                  "tool_call"
               ],
               "examples": [
                  "input"
               ],
               "title": "Type",
               "type": "string"
            },
            "column_name": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The column name in the tabular data to be used for reading the field value. Applicable for tabular source.",
               "title": "Column Name"
            },
            "span_name": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The span name in the trace data to be used for reading the field value. Applicable for trace source.",
               "title": "Span Name"
            },
            "attribute_name": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The attribute name in the trace to be used for reading the field value. Applicable for trace source.",
               "title": "Attribute Name"
            },
            "json_path": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The json path to be used for reading the field value from the attribute value. Applicable for trace source. If not provided, the span attribute value is read as the field value.",
               "title": "Json Path"
            },
            "lookup_child_spans": {
               "anyOf": [
                  {
                     "type": "boolean"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": false,
               "description": "The flag to indicate if all the child spans should be searched for the attribute value. Applicable for trace source.",
               "title": "Look up child spans"
            }
         },
         "required": [
            "name",
            "type"
         ],
         "title": "MappingItem",
         "type": "object"
      },
      "MetricGroup": {
         "enum": [
            "retrieval_quality",
            "answer_quality",
            "content_safety",
            "performance",
            "usage",
            "message_completion",
            "tool_call_quality",
            "readability",
            "custom"
         ],
         "title": "MetricGroup",
         "type": "string"
      },
      "MetricThreshold": {
         "description": "The class that defines the threshold for a metric.",
         "properties": {
            "type": {
               "description": "Threshold type. One of 'lower_limit', 'upper_limit'",
               "enum": [
                  "lower_limit",
                  "upper_limit"
               ],
               "title": "Type",
               "type": "string"
            },
            "value": {
               "default": 0,
               "description": "The value of metric threshold",
               "title": "Threshold value",
               "type": "number"
            }
         },
         "required": [
            "type"
         ],
         "title": "MetricThreshold",
         "type": "object"
      },
      "MetricsConfiguration": {
         "description": "The class representing the metrics to be computed and the configuration details required for them.\n\nExamples:\n    1. Create MetricsConfiguration with default agentic ai configuration\n        .. code-block:: python\n\n            metrics_configuration = MetricsConfiguration(metrics=[ContextRelevanceMetric()],\n                                                         metric_groups=[MetricGroup.RETRIEVAL_QUALITY])])\n\n    2. Create MetricsConfiguration by specifying agentic ai configuration\n        .. code-block:: python\n\n            config = {\n                \"input_fields\": [\"input\"],\n                \"context_fields\": [\"contexts\"]\n            }\n            metrics_configuration = MetricsConfiguration(configuration=AgenticAIConfiguration(**config)\n                                                           metrics=[ContextRelevanceMetric()],\n                                                           metric_groups=[MetricGroup.RETRIEVAL_QUALITY])])",
         "properties": {
            "configuration": {
               "$ref": "#/$defs/AgenticAIConfiguration",
               "default": {
                  "record_id_field": "record_id",
                  "record_timestamp_field": "record_timestamp",
                  "task_type": null,
                  "input_fields": [
                     "input_text"
                  ],
                  "context_fields": [
                     "context"
                  ],
                  "output_fields": [
                     "generated_text"
                  ],
                  "reference_fields": [
                     "ground_truth"
                  ],
                  "locale": null,
                  "tools": [],
                  "tool_calls_field": "tool_calls",
                  "available_tools_field": "available_tools",
                  "llm_judge": null,
                  "prompt_field": "model_prompt",
                  "message_id_field": "message_id",
                  "conversation_id_field": "conversation_id"
               },
               "description": "The configuration of the metrics to compute. The configuration contains the fields names to be read when computing the metrics.",
               "title": "Metrics configuration"
            },
            "metrics": {
               "anyOf": [
                  {
                     "items": {
                        "$ref": "#/$defs/GenAIMetric"
                     },
                     "type": "array"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": [],
               "description": "The list of metrics to compute.",
               "title": "Metrics"
            },
            "metric_groups": {
               "anyOf": [
                  {
                     "items": {
                        "$ref": "#/$defs/MetricGroup"
                     },
                     "type": "array"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": [],
               "description": "The list of metric groups to compute.",
               "title": "Metric Groups"
            }
         },
         "title": "MetricsConfiguration",
         "type": "object"
      },
      "ModelProvider": {
         "properties": {
            "type": {
               "$ref": "#/$defs/ModelProviderType",
               "description": "The type of model provider."
            }
         },
         "required": [
            "type"
         ],
         "title": "ModelProvider",
         "type": "object"
      },
      "ModelProviderType": {
         "description": "Supported model provider types for Generative AI",
         "enum": [
            "ibm_watsonx.ai",
            "azure_openai",
            "rits",
            "openai",
            "vertex_ai",
            "google_ai_studio",
            "aws_bedrock",
            "custom",
            "portkey"
         ],
         "title": "ModelProviderType",
         "type": "string"
      },
      "Node": {
         "description": "The class representing a node in an agentic application.\n\nExamples:\n    1. Create Node with metrics configuration and default agentic ai configuration\n        .. code-block:: python\n\n            metrics_configurations = [MetricsConfiguration(metrics=[ContextRelevanceMetric()],\n                                                           metric_groups=[MetricGroup.RETRIEVAL_QUALITY])])]\n            node = Node(name=\"Retrieval Node\",\n                        metrics_configurations=metrics_configurations)\n\n    2. Create Node with metrics configuration and specifying agentic ai configuration\n        .. code-block:: python\n\n            node_config = {\"input_fields\": [\"input\"],\n                           \"output_fields\": [\"output\"],\n                           \"context_fields\": [\"contexts\"],\n                           \"reference_fields\": [\"reference\"]}\n            metrics_configurations = [MetricsConfiguration(configuration=AgenticAIConfiguration(**node_config)\n                                                           metrics=[ContextRelevanceMetric()],\n                                                           metric_groups=[MetricGroup.RETRIEVAL_QUALITY])])]\n            node = Node(name=\"Retrieval Node\",\n                        metrics_configurations=metrics_configurations)",
         "properties": {
            "name": {
               "description": "The name of the node.",
               "title": "Name",
               "type": "string"
            },
            "func_name": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The name of the node function.",
               "title": "Node function name"
            },
            "metrics_configurations": {
               "default": [],
               "description": "The list of metrics and their configuration details.",
               "items": {
                  "$ref": "#/$defs/MetricsConfiguration"
               },
               "title": "Metrics configuration",
               "type": "array"
            },
            "foundation_models": {
               "default": [],
               "description": "The Foundation models invoked by the node",
               "items": {
                  "$ref": "#/$defs/FoundationModelInfo"
               },
               "title": "Foundation Models",
               "type": "array"
            }
         },
         "required": [
            "name"
         ],
         "title": "Node",
         "type": "object"
      },
      "OTLPCollectorConfiguration": {
         "description": "Defines the OTLPCollectorConfiguration class.\nIt contains the configuration settings for the OpenTelemetry Protocol collector.\n\nExamples:\n    1. Create OTLPCollectorConfiguration with default parameters\n        .. code-block:: python\n\n            oltp_config = OTLPCollectorConfiguration()\n\n    1. Create OTLPCollectorConfiguration by providing server endpoint details.\n        .. code-block:: python\n\n            oltp_config = OTLPCollectorConfiguration(app_name=\"app\",\n                                                     endpoint=\"https://hostname/ml/v1/traces\",\n                                                     timeout=10,\n                                                     headers={\"Authorization\": \"Bearer token\"})",
         "properties": {
            "app_name": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "description": "Application name for tracing.",
               "title": "App Name"
            },
            "endpoint": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": "http://localhost:4318/v1/traces",
               "description": "The OTLP collector endpoint URL for sending trace data. Default value is 'http://localhost:4318/v1/traces'",
               "title": "OTLP Endpoint"
            },
            "insecure": {
               "anyOf": [
                  {
                     "type": "boolean"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": false,
               "description": "Whether to disable TLS for the exporter (i.e., use an insecure connection). Default is False.",
               "title": "Insecure Connection"
            },
            "is_grpc": {
               "anyOf": [
                  {
                     "type": "boolean"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": false,
               "description": "If True, use gRPC for exporting traces instead of HTTP. Default is False.",
               "title": "Use gRPC"
            },
            "timeout": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": 100,
               "description": "Timeout in milliseconds for sending telemetry data to the collector. Default is 100ms.",
               "title": "Timeout"
            },
            "headers": {
               "anyOf": [
                  {
                     "additionalProperties": {
                        "type": "string"
                     },
                     "type": "object"
                  },
                  {
                     "type": "null"
                  }
               ],
               "description": "Headers needed to call the server.",
               "title": "Headers"
            }
         },
         "required": [
            "app_name"
         ],
         "title": "OTLPCollectorConfiguration",
         "type": "object"
      },
      "OpenAICredentials": {
         "description": "Defines the OpenAICredentials class to specify the OpenAI server details.\n\nExamples:\n    1. Create OpenAICredentials with default parameters. By default Dallas region is used.\n        .. code-block:: python\n\n            openai_credentials = OpenAICredentials(api_key=api_key,\n                                                   url=openai_url)\n\n    2. Create OpenAICredentials by reading from environment variables.\n        .. code-block:: python\n\n            os.environ[\"OPENAI_API_KEY\"] = \"...\"\n            os.environ[\"OPENAI_URL\"] = \"...\"\n            openai_credentials = OpenAICredentials.create_from_env()",
         "properties": {
            "url": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "title": "Url"
            },
            "api_key": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "title": "Api Key"
            }
         },
         "required": [
            "url",
            "api_key"
         ],
         "title": "OpenAICredentials",
         "type": "object"
      },
      "OpenAIFoundationModel": {
         "description": "The OpenAI foundation model details\n\nExamples:\n    1. Create OpenAI foundation model by passing the credentials during object creation. Note that the url is optional and will be set to the default value for OpenAI. To change the default value, the url should be passed to ``OpenAICredentials`` object.\n        .. code-block:: python\n\n            openai_foundation_model = OpenAIFoundationModel(\n                model_id=\"gpt-4o-mini\",\n                provider=OpenAIModelProvider(\n                    credentials=OpenAICredentials(\n                        api_key=api_key,\n                        url=openai_url,\n                    )\n                )\n            )\n\n    2. Create OpenAI foundation model by setting the credentials in environment variables:\n        * ``OPENAI_API_KEY`` is used to set the api key for OpenAI.\n        * ``OPENAI_URL`` is used to set the url for OpenAI\n\n        .. code-block:: python\n\n            openai_foundation_model = OpenAIFoundationModel(\n                model_id=\"gpt-4o-mini\",\n            )",
         "properties": {
            "model_name": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The name of the foundation model.",
               "title": "Model Name"
            },
            "provider": {
               "$ref": "#/$defs/OpenAIModelProvider",
               "description": "OpenAI provider"
            },
            "model_id": {
               "description": "Model name from OpenAI",
               "title": "Model Id",
               "type": "string"
            }
         },
         "required": [
            "model_id"
         ],
         "title": "OpenAIFoundationModel",
         "type": "object"
      },
      "OpenAIModelProvider": {
         "properties": {
            "type": {
               "$ref": "#/$defs/ModelProviderType",
               "default": "openai",
               "description": "The type of model provider."
            },
            "credentials": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/OpenAICredentials"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "OpenAI credentials. This can also be set by using `OPENAI_API_KEY` environment variable."
            }
         },
         "title": "OpenAIModelProvider",
         "type": "object"
      },
      "PortKeyCredentials": {
         "description": "Defines the PortKeyCredentials class to specify the PortKey Gateway details.\n\nExamples:\n    1. Create PortKeyCredentials with default parameters.\n        .. code-block:: python\n\n            portkey_credentials = PortKeyCredentials(api_key=api_key,\n                                                    url=portkey_url,\n                                                    provider_api_key=provider_api_key,\n                                                    provider=provider_name)\n\n    2. Create PortKeyCredentials by reading from environment variables.\n        .. code-block:: python\n\n            os.environ[\"PORTKEY_API_KEY\"] = \"...\"\n            os.environ[\"PORTKEY_URL\"] = \"...\"\n            os.environ[\"PORTKEY_PROVIDER_API_KEY\"] = \"...\"\n            os.environ[\"PORTKEY_PROVIDER_NAME\"] = \"...\"\n            portkey_credentials = PortKeyCredentials.create_from_env()",
         "properties": {
            "url": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "description": "PortKey url. This attribute can be read from `PORTKEY_URL` environment variable.",
               "title": "Url"
            },
            "api_key": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "description": "API key for PortKey. This attribute can be read from `PORTKEY_API_KEY` environment variable.",
               "title": "Api Key"
            },
            "provider_api_key": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "description": "API key for the provider. This attribute can be read from `PORTKEY_PROVIDER_API_KEY` environment variable.",
               "title": "Provider Api Key"
            },
            "provider": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "description": "The provider name. This attribute can be read from `PORTKEY_PROVIDER_NAME` environment variable.",
               "title": "Provider"
            }
         },
         "required": [
            "url",
            "api_key",
            "provider_api_key",
            "provider"
         ],
         "title": "PortKeyCredentials",
         "type": "object"
      },
      "PortKeyGateway": {
         "description": "The PortKey gateway details\n\nExamples:\n    1. Create PortKeyGateway by passing the credentials during object creation. Note that the url is optional and will be set to the default value for PortKey. To change the default value, the url should be passed to ``PortKeyCredentials`` object.\n        .. code-block:: python\n\n            port_key_gateway = PortKeyGateway(\n                model_id=\"gpt-4o-mini\",\n                provider=PortKeyModelProvider(\n                    credentials=PortKeyCredentials(\n                        api_key=api_key,\n                        url=openai_url,\n                        provider_api_key=provider_api_key,\n                        provider_name=provider_name\n                    )\n                )\n            )\n\n    2. Create PortKeyGateway by setting the credentials in environment variables:\n        * ``PORTKEY_API_KEY`` is used to set the api key for PortKey.\n        * ``PORTKEY_URL`` is used to set the url for PortKey.\n        * ``PORTKEY_PROVIDER_API_KEY`` is used to set the provider api key for PortKey.\n        * ``PORTKEY_PROVIDER_NAME`` is used to set the provider name for PortKey\n\n        .. code-block:: python\n\n            port_key_gateway = PortKeyGateway(\n                model_id=\"gpt-4o-mini\",\n            )",
         "properties": {
            "model_name": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The name of the foundation model.",
               "title": "Model Name"
            },
            "provider": {
               "$ref": "#/$defs/PortKeyModelProvider",
               "description": "PortKey Provider"
            },
            "model_id": {
               "description": "Model name from the Provider",
               "title": "Model Id",
               "type": "string"
            }
         },
         "required": [
            "model_id"
         ],
         "title": "PortKeyGateway",
         "type": "object"
      },
      "PortKeyModelProvider": {
         "properties": {
            "type": {
               "$ref": "#/$defs/ModelProviderType",
               "default": "portkey",
               "description": "The type of model provider."
            },
            "credentials": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/PortKeyCredentials"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "PortKey credentials."
            }
         },
         "title": "PortKeyModelProvider",
         "type": "object"
      },
      "RITSCredentials": {
         "properties": {
            "hostname": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": "https://inference-3scale-apicast-production.apps.rits.fmaas.res.ibm.com",
               "description": "The rits hostname",
               "title": "Hostname"
            },
            "api_key": {
               "title": "Api Key",
               "type": "string"
            }
         },
         "required": [
            "api_key"
         ],
         "title": "RITSCredentials",
         "type": "object"
      },
      "RITSFoundationModel": {
         "properties": {
            "model_name": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The name of the foundation model.",
               "title": "Model Name"
            },
            "provider": {
               "$ref": "#/$defs/RITSModelProvider",
               "description": "The provider of the model."
            }
         },
         "title": "RITSFoundationModel",
         "type": "object"
      },
      "RITSModelProvider": {
         "properties": {
            "type": {
               "$ref": "#/$defs/ModelProviderType",
               "default": "rits",
               "description": "The type of model provider."
            },
            "credentials": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/RITSCredentials"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "RITS credentials."
            }
         },
         "title": "RITSModelProvider",
         "type": "object"
      },
      "TaskType": {
         "description": "Supported task types for generative AI models",
         "enum": [
            "question_answering",
            "classification",
            "summarization",
            "generation",
            "extraction",
            "retrieval_augmented_generation"
         ],
         "title": "TaskType",
         "type": "string"
      },
      "TracingConfiguration": {
         "description": "Defines the tracing configuration class. \nTracing configuration is required if the the evaluations are needed to be tracked in an experiment or if the agentic application traces should be sent to a Open Telemetry Collector.\nOne of project_id or space_id is required.\nIf the otlp_collector_config is provided, the traces are logged to Open Telemetry Collector, otherwise the traces are logged to file on disk.\nIf its required to log the traces to both collector and local file, provide the otlp_collector_config and set the flag log_traces_to_file to True.\n\nExamples:\n    1. Create Tracing configuration to track the results in an experiment\n        .. code-block:: python\n\n            tracing_config = TracingConfiguration(project_id=\"...\")\n            agentic_evaluator = AgenticEvaluator(tracing_configuration=tracing_config)\n            agentic_evaluator.track_experiment(name=\"my_experiment\")\n            ...\n\n    2. Create Tracing configuration to send traces to collector\n        .. code-block:: python\n\n            oltp_collector_config = OTLPCollectorConfiguration(endpoint=\"http://hostname:4318/v1/traces\")\n            tracing_config = TracingConfiguration(space_id=\"...\",\n                                                  resource_attributes={\n                                                        \"wx-deployment-id\": deployment_id,\n                                                        \"wx-instance-id\": \"wml-instance-id1\",\n                                                        \"wx-ai-service-id\": \"ai-service-id1\"},\n                                                   otlp_collector_config=oltp_collector_config)\n            agentic_evaluator = AgenticEvaluator(tracing_configuration=tracing_config)\n            ...",
         "properties": {
            "project_id": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The project id.",
               "title": "Project ID"
            },
            "space_id": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The space id.",
               "title": "Space ID"
            },
            "resource_attributes": {
               "anyOf": [
                  {
                     "additionalProperties": {
                        "type": "string"
                     },
                     "type": "object"
                  },
                  {
                     "type": "null"
                  }
               ],
               "description": "The resource attributes set in all the spans.",
               "title": "Resource Attributes"
            },
            "otlp_collector_config": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/OTLPCollectorConfiguration"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "OTLP Collector configuration.",
               "title": "OTLP Collector Config"
            },
            "log_traces_to_file": {
               "anyOf": [
                  {
                     "type": "boolean"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": false,
               "description": "The flag to enable logging of traces to a file. If set to True, the traces are logged to a file. Use the flag when its needed to log the traces to file and to be sent to the server simultaneously.",
               "title": "Log Traces to file"
            }
         },
         "title": "TracingConfiguration",
         "type": "object"
      },
      "VertexAICredentials": {
         "description": "Defines the VertexAICredentials class for accessing Vertex AI using service account credentials.\n\nExamples:\n    1. Create credentials manually:\n        .. code-block:: python\n\n            vertex_credentials = VertexAICredentials(\n                credentials_path=\"path/to/service_account.json\",\n                project_id=\"my-gcp-project\",\n                location=\"us-central1\"\n            )\n\n    2. Create credentials from environment:\n        .. code-block:: python\n\n            os.environ[\"GOOGLE_APPLICATION_CREDENTIALS\"] = \"path/to/service_account.json\"\n            os.environ[\"GOOGLE_CLOUD_PROJECT\"] = \"my-gcp-project\"\n            os.environ[\"GOOGLE_CLOUD_LOCATION\"] = \"us-central1\"\n\n            vertex_ai_credentials = VertexAICredentials.create_from_env()",
         "properties": {
            "credentials_path": {
               "description": "Path to service-account JSON. This attribute can be read from GOOGLE_APPLICATION_CREDENTIALS environment variable when creating VertexAICredentials from environment.",
               "title": "Credentials Path",
               "type": "string"
            },
            "project_id": {
               "description": "The Google Cloud project id. This attribute can be read from GOOGLE_CLOUD_PROJECT or GCLOUD_PROJECT environment variable when creating VertexAICredentials from environment.",
               "title": "Project ID",
               "type": "string"
            },
            "location": {
               "default": "us-central1",
               "description": "Vertex AI region. This attribute can be read from GOOGLE_CLOUD_LOCATION environment variable when creating VertexAICredentials from environment. By default us-central1 location is used.",
               "examples": [
                  "us-central1",
                  "europe-west4"
               ],
               "title": "Location",
               "type": "string"
            }
         },
         "required": [
            "credentials_path",
            "project_id"
         ],
         "title": "VertexAICredentials",
         "type": "object"
      },
      "VertexAIFoundationModel": {
         "description": "Represents a foundation model served via Vertex AI.\n\nExamples:\n    1. Create Vertex AI foundation model by passing the credentials during object creation.\n        .. code-block:: python\n\n            model = VertexAIFoundationModel(\n                model_id=\"gemini-1.5-pro-002\",\n                provider=VertexAIModelProvider(\n                    credentials=VertexAICredentials(\n                        project_id=\"your-project\",\n                        location=\"us-central1\", # This is optional field, by default us-central1 location is selected\n                        credentials_path=\"/path/to/service_account.json\"\n                    )\n                )\n            )\n    2. Create Vertex AI foundation model by setting the credentials in environment variables:\n        * ``GOOGLE_APPLICATION_CREDENTIALS`` is used to set the Credentials path for Vertex AI.\n        * ``GOOGLE_CLOUD_PROJECT`` is used to set the Project id for Vertex AI.\n        * ``GOOGLE_CLOUD_LOCATION`` is uses to set the Location for Vertex AI. By default us-central1 location is used when GOOGLE_CLOUD_LOCATION is not provided .\n\n            .. code-block:: python\n\n                os.environ[\"GOOGLE_APPLICATION_CREDENTIALS\"] = \"path/to/service_account.json\"\n                os.environ[\"GOOGLE_CLOUD_PROJECT\"] = \"my-gcp-project\"\n                os.environ[\"GOOGLE_CLOUD_LOCATION\"] = \"us-central1\"\n\n                model = VertexAIFoundationModel(\n                    model_id=\"gemini/gpt-4o-mini\",\n                )",
         "properties": {
            "model_name": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The name of the foundation model.",
               "title": "Model Name"
            },
            "provider": {
               "$ref": "#/$defs/VertexAIModelProvider",
               "description": "Vertex AI provider.",
               "title": "Provider"
            },
            "model_id": {
               "description": "Model name for Vertex AI. Must be a valid Vertex AI model identifier or a fully-qualified publisher path",
               "examples": [
                  "gemini-1.5-pro-002"
               ],
               "title": "Model id",
               "type": "string"
            }
         },
         "required": [
            "model_id"
         ],
         "title": "VertexAIFoundationModel",
         "type": "object"
      },
      "VertexAIModelProvider": {
         "description": "Represents a model provider using Vertex AI.\n\nExamples:\n    1. Create provider using credentials object:\n        .. code-block:: python\n\n            provider = VertexAIModelProvider(\n                credentials=VertexAICredentials(\n                    credentials_path=\"path/to/key.json\",\n                    project_id=\"your-project\",\n                    location=\"us-central1\" \n                )\n            )\n\n    2. Create provider using environment variables:\n        .. code-block:: python\n\n            os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = \"/path/to/service_account.json\"\n            os.environ['GOOGLE_CLOUD_PROJECT'] = \"your-project\"\n            os.environ['GOOGLE_CLOUD_LOCATION'] = \"us-central1\" # This is optional field, by default us-central1 location is selected\n\n            provider = VertexAIModelProvider()",
         "properties": {
            "type": {
               "$ref": "#/$defs/ModelProviderType",
               "default": "vertex_ai",
               "description": "The type of model provider."
            },
            "credentials": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/VertexAICredentials"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Vertex AI credentials."
            }
         },
         "title": "VertexAIModelProvider",
         "type": "object"
      },
      "WxAICredentials": {
         "description": "Defines the WxAICredentials class to specify the watsonx.ai server details.\n\nExamples:\n    1. Create WxAICredentials with default parameters. By default Dallas region is used.\n        .. code-block:: python\n\n            wxai_credentials = WxAICredentials(api_key=\"...\")\n\n    2. Create WxAICredentials by specifying region url.\n        .. code-block:: python\n\n            wxai_credentials = WxAICredentials(api_key=\"...\",\n                                               url=\"https://au-syd.ml.cloud.ibm.com\")\n\n    3. Create WxAICredentials by reading from environment variables.\n        .. code-block:: python\n\n            os.environ[\"WATSONX_APIKEY\"] = \"...\"\n            # [Optional] Specify watsonx region specific url. Default is https://us-south.ml.cloud.ibm.com .\n            os.environ[\"WATSONX_URL\"] = \"https://eu-gb.ml.cloud.ibm.com\"\n            wxai_credentials = WxAICredentials.create_from_env()\n\n    4. Create WxAICredentials for on-prem.\n        .. code-block:: python\n\n            wxai_credentials = WxAICredentials(url=\"https://<hostname>\",\n                                               username=\"...\"\n                                               api_key=\"...\",\n                                               version=\"5.2\")\n\n    5. Create WxAICredentials by reading from environment variables for on-prem.\n        .. code-block:: python\n\n            os.environ[\"WATSONX_URL\"] = \"https://<hostname>\"\n            os.environ[\"WATSONX_VERSION\"] = \"5.2\"\n            os.environ[\"WATSONX_USERNAME\"] = \"...\"\n            os.environ[\"WATSONX_APIKEY\"] = \"...\"\n            # Only one of api_key or password is needed\n            #os.environ[\"WATSONX_PASSWORD\"] = \"...\"\n            wxai_credentials = WxAICredentials.create_from_env()",
         "properties": {
            "url": {
               "default": "https://us-south.ml.cloud.ibm.com",
               "description": "The url for watsonx ai service",
               "examples": [
                  "https://us-south.ml.cloud.ibm.com",
                  "https://eu-de.ml.cloud.ibm.com",
                  "https://eu-gb.ml.cloud.ibm.com",
                  "https://jp-tok.ml.cloud.ibm.com",
                  "https://au-syd.ml.cloud.ibm.com"
               ],
               "title": "watsonx.ai url",
               "type": "string"
            },
            "api_key": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The user api key. Required for using watsonx as a service and one of api_key or password is required for using watsonx on-prem software.",
               "strip_whitespace": true,
               "title": "Api Key"
            },
            "version": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The watsonx on-prem software version. Required for using watsonx on-prem software.",
               "title": "Version"
            },
            "username": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The user name. Required for using watsonx on-prem software.",
               "title": "User name"
            },
            "password": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The user password. One of api_key or password is required for using watsonx on-prem software.",
               "title": "Password"
            },
            "instance_id": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": "openshift",
               "description": "The watsonx.ai instance id. Default value is openshift.",
               "title": "Instance id"
            }
         },
         "title": "WxAICredentials",
         "type": "object"
      },
      "WxAIFoundationModel": {
         "description": "The IBM watsonx.ai foundation model details\n\nTo initialize the foundation model, you can either pass in the credentials directly or set the environment.\nYou can follow these examples to create the provider.\n\nExamples:\n    1. Create foundation model by specifying the credentials during object creation:\n        .. code-block:: python\n\n            # Specify the credentials during object creation\n            wx_ai_foundation_model = WxAIFoundationModel(\n                model_id=\"ibm/granite-3-3-8b-instruct\",\n                project_id=<PROJECT_ID>,\n                provider=WxAIModelProvider(\n                    credentials=WxAICredentials(\n                        url=wx_url, # This is optional field, by default US-Dallas region is selected\n                        api_key=wx_apikey,\n                    )\n                )\n            )\n\n    2. Create foundation model by setting the credentials environment variables:\n        * The api key can be set using one of the environment variables ``WXAI_API_KEY``, ``WATSONX_APIKEY``, or ``WXG_API_KEY``. These will be read in the order of precedence.\n        * The url is optional and will be set to US-Dallas region by default. It can be set using one of the environment variables ``WXAI_URL``, ``WATSONX_URL``, or ``WXG_URL``. These will be read in the order of precedence.\n\n        .. code-block:: python\n\n            wx_ai_foundation_model = WxAIFoundationModel(\n                model_id=\"ibm/granite-3-3-8b-instruct\",\n                project_id=<PROJECT_ID>,\n            )\n\n    3. Create foundation model by specifying watsonx.governance software credentials during object creation:\n        .. code-block:: python\n\n            wx_ai_foundation_model = WxAIFoundationModel(\n                model_id=\"ibm/granite-3-3-8b-instruct\",\n                project_id=project_id,\n                provider=WxAIModelProvider(\n                    credentials=WxAICredentials(\n                        url=wx_url,\n                        api_key=wx_apikey,\n                        username=wx_username,\n                        version=wx_version,\n                    )\n                )\n            )\n\n    4. Create foundation model by setting watsonx.governance software credentials environment variables:\n        * The api key can be set using one of the environment variables ``WXAI_API_KEY``, ``WATSONX_APIKEY``, or ``WXG_API_KEY``. These will be read in the order of precedence.\n        * The url can be set using one of these environment variable ``WXAI_URL``, ``WATSONX_URL``, or ``WXG_URL``. These will be read in the order of precedence.\n        * The username can be set using one of these environment variable ``WXAI_USERNAME``, ``WATSONX_USERNAME``, or ``WXG_USERNAME``. These will be read in the order of precedence.\n        * The version of watsonx.governance software can be set using one of these environment variable ``WXAI_VERSION``, ``WATSONX_VERSION``, or ``WXG_VERSION``. These will be read in the order of precedence.\n\n        .. code-block:: python\n\n            wx_ai_foundation_model = WxAIFoundationModel(\n                model_id=\"ibm/granite-3-3-8b-instruct\",\n                project_id=project_id,\n            )",
         "properties": {
            "model_name": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The name of the foundation model.",
               "title": "Model Name"
            },
            "provider": {
               "$ref": "#/$defs/WxAIModelProvider",
               "description": "The provider of the model."
            },
            "model_id": {
               "description": "The unique identifier for the watsonx.ai model.",
               "title": "Model Id",
               "type": "string"
            },
            "project_id": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The project ID associated with the model.",
               "title": "Project Id"
            },
            "space_id": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The space ID associated with the model.",
               "title": "Space Id"
            }
         },
         "required": [
            "model_id"
         ],
         "title": "WxAIFoundationModel",
         "type": "object"
      },
      "WxAIModelProvider": {
         "description": "This class represents a model provider configuration for IBM watsonx.ai. It includes the provider type and\ncredentials required to authenticate and interact with the watsonx.ai platform. If credentials are not explicitly\nprovided, it attempts to load them from environment variables.\n\nExamples:\n    1. Create provider using credentials object:\n        .. code-block:: python\n\n            credentials = WxAICredentials(\n                url=\"https://us-south.ml.cloud.ibm.com\",\n                api_key=\"your-api-key\"\n            )\n            provider = WxAIModelProvider(credentials=credentials)\n\n    2. Create provider using environment variables:\n        .. code-block:: python\n\n            import os\n\n            os.environ['WATSONX_URL'] = \"https://us-south.ml.cloud.ibm.com\"\n            os.environ['WATSONX_APIKEY'] = \"your_api_key\"\n\n            provider = WxAIModelProvider()",
         "properties": {
            "type": {
               "$ref": "#/$defs/ModelProviderType",
               "default": "ibm_watsonx.ai",
               "description": "The type of model provider."
            },
            "credentials": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/WxAICredentials"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The credentials used to authenticate with watsonx.ai. If not provided, they will be loaded from environment variables."
            }
         },
         "title": "WxAIModelProvider",
         "type": "object"
      }
   }
}

Config:
  • arbitrary_types_allowed: bool = True

Fields:
field agentic_app: Annotated[AgenticApp | None, FieldInfo(annotation=NoneType, required=False, default=None, title='Agentic application configuration details', description='The agentic application configuration details.')] = None

The agentic application configuration details.

field ai_experiment_client: Annotated[AIExperimentsClient | None, FieldInfo(annotation=NoneType, required=False, default=None, title='AI experiments client', description='The AI experiment client object.')] = None

The AI experiment client object.

field max_concurrency: Annotated[int, FieldInfo(annotation=NoneType, required=False, default=10, title='Max Concurrency', description='The maximum concurrency to use for evaluating metrics.')] = 10

The maximum concurrency to use for evaluating metrics.

field tracing_configuration: Annotated[TracingConfiguration | None, FieldInfo(annotation=NoneType, required=False, default=None, title='Tracing Configuration', description='The tracing configuration details.')] = None

The tracing configuration details.

compare_ai_experiments(ai_experiments: List[AIExperiment] = None, ai_evaluation_details: AIEvaluationAsset = None) str

Creates an AI Evaluation asset to compare AI experiment runs.

Parameters:
  • ai_experiments (List[AIExperiment], optional) – List of AI experiments to be compared. If all runs for an experiment need to be compared, then specify the runs value as empty list for the experiment.

  • ai_evaluation_details (AIEvaluationAsset, optional) – An instance of AIEvaluationAsset having details (name, description and metrics configuration)

Returns:

An instance of AIEvaluationAsset.

Examples

  1. Create AI evaluation with list of experiment IDs

# Initialize the API client with credentials
api_client = APIClient(credentials=Credentials(api_key="", url="wos_url"))

# Create the instance of Agentic evaluator
evaluator = AgenticEvaluator(api_client=api_client, tracing_configuration=TracingConfiguration(project_id=project_id))

# [Optional] Define evaluation configuration
evaluation_config = EvaluationConfig(
    monitors={
        "agentic_ai_quality": {
            "parameters": {
                "metrics_configuration": {}
            }
        }
    }
)

# Create the evaluation asset
ai_evaluation_details = AIEvaluationAsset(
    name="AI Evaluation for agent",
    evaluation_configuration=evaluation_config
)

# Compare two or more AI experiments using the evaluation asset
ai_experiment1 = AIExperiment(
    asset_id = ai_experiment_id_1,
    runs = [<Run1 details>, <Run2 details>] # Run details are returned by the start_run method
)
ai_experiment2 = AIExperiment(
    asset_id = ai_experiment_id_2,
    runs = [] # Empty list means all runs for this experiment will be compared
)
ai_evaluation_asset_href = evaluator.compare_ai_experiments(
    ai_experiments = [ai_experiment_1, ai_experiment_2],
    ai_evaluation_details=ai_evaluation_asset
)
end_run(track_notebook: bool | None = False)

End a run to collect and compute the metrics within the current run.

Parameters:

track_notebook (bool) – flag to specify storing the notebook with current run

evaluate_answer_quality(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) dict

An evaluation decorator for computing answer quality metrics on an agentic tool. Answer Quality metrics include Answer Relevance, Faithfulness, Answer Similarity, Unsuccessful Requests

For more details, see ibm_watsonx_gov.metrics.AnswerRelevanceMetric, ibm_watsonx_gov.metrics.FaithfulnessMetric, ibm_watsonx_gov.metrics.UnsuccessfulRequestsMetric, see ibm_watsonx_gov.metrics.AnswerSimilarityMetric,

Parameters:
  • func (Optional[Callable], optional) – The tool on which the metric is to be computed.

  • configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.

  • metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to MetricGroup.ANSWER_QUALITY.get_metrics().

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

  1. Basic usage
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_answer_quality
    def agentic_tool(*args, *kwargs):
        pass
    
  2. Usage with different thresholds and methods for some of the metrics in the group
    metric_1 = FaithfulnessMetric(thresholds=MetricThreshold(type="lower_limit", value=0.5))
    metric_2 = AnswerRelevanceMetric(method="token_recall", thresholds=MetricThreshold(type="lower_limit", value=0.5))
    
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_answer_quality(metrics=[metric_1, metric_2])
    def agentic_tool(*args, *kwargs):
        pass
    
evaluate_answer_relevance(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) dict

An evaluation decorator for computing answer relevance metric on an agentic tool.

For more details, see ibm_watsonx_gov.metrics.AnswerRelevanceMetric

Parameters:
  • func (Optional[Callable], optional) – The tool on which the metric is to be computed.

  • configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.

  • metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to [ AnswerRelevanceMetric() ].

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

  1. Basic usage
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_answer_relevance
    def agentic_tool(*args, *kwargs):
        pass
    
  2. Usage with different thresholds and methods
    metric_1 = AnswerRelevanceMetric(method="token_recall", thresholds=[MetricThreshold(type="lower_limit", value=0.5)])
    metric_2 = AnswerRelevanceMetric(method="granite_guardian", thresholds=[MetricThreshold(type="lower_limit", value=0.5)])
    
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_answer_relevance(metrics=[metric_1, metric_2])
    def agentic_tool(*args, *kwargs):
        pass
    
evaluate_answer_similarity(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) dict

An evaluation decorator for computing answer similarity metric on an agentic node.

For more details, see ibm_watsonx_gov.metrics.AnswerSimilarityMetric

Parameters:
  • func (Optional[Callable], optional) – The node on which the metric is to be computed.

  • configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.

  • metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to [ AnswerSimilarityMetric() ].

  • compute_real_time (Optional[bool], optional) – The flag to indicate whether the metric should be computed along with the node execution or not.

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped node.

Return type:

dict

Examples

  1. Basic usage
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_answer_similarity
    def agentic_node(*args, *kwargs):
        pass
    
  2. Usage with different thresholds and methods
    metric_1 = AnswerSimilarityMetric(
        method="token_k_precision", threshold=MetricThreshold(type="lower_limit", value=0.5))
    metric_2 = AnswerSimilarityMetric(
        method="sentence_bert_mini_lm", threshold=MetricThreshold(type="lower_limit", value=0.6))
    
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_answer_similarity(metrics=[metric_1, metric_2])
    def agentic_node(*args, *kwargs):
        pass
    
evaluate_average_precision(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) dict

An evaluation decorator for computing average precision metric on an agentic tool. This metric uses context relevance values for computation, context relevance metric would be computed as a prerequisite.

For more details, see ibm_watsonx_gov.metrics.AveragePrecisionMetric

Parameters:
  • func (Optional[Callable], optional) – The tool on which the metric is to be computed.

  • configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.

  • metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to [ AveragePrecisionMetric() ].

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

  1. Basic usage
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_average_precision
    def agentic_tool(*args, *kwargs):
        pass
    
  2. Usage with different thresholds and methods
    metric_1 = AveragePrecisionMetric(threshold=MetricThreshold(type="lower_limit", value=0.5))
    metric_2 = ContextRelevanceMetric(method="sentence_bert_mini_lm", threshold=MetricThreshold(type="lower_limit", value=0.6))
    
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_average_precision(metrics=[metric_1, metric_2])
    def agentic_tool(*args, *kwargs):
        pass
    
evaluate_content_safety(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) dict

An evaluation decorator for computing content safety metrics on an agentic tool. Content Safety metrics include HAP, PII, Evasiveness, Harm, HarmEngagement, Jailbreak, Profanity, SexualContent, Social Bias, UnethicalBehavior and Violence

For more details, see ibm_watsonx_gov.metrics.HAPMetric, ibm_watsonx_gov.metrics.PIIMetric, ibm_watsonx_gov.metrics.EvasivenessMetric, ibm_watsonx_gov.metrics.HarmMetric, ibm_watsonx_gov.metrics.HarmEngagementMetric, ibm_watsonx_gov.metrics.JailbreakMetric, ibm_watsonx_gov.metrics.ProfanityMetric, ibm_watsonx_gov.metrics.SexualContentMetric, ibm_watsonx_gov.metrics.SocialBiasMetric, ibm_watsonx_gov.metrics.UnethicalBehaviorMetric, ibm_watsonx_gov.metrics.ViolenceMetric :param func: The tool on which the metric is to be computed. :type func: Optional[Callable], optional :param configuration: The configuration specific to this evaluator. Defaults to None. :type configuration: Optional[AgenticAIConfiguration], optional :param metrics: The list of metrics to compute as part of this evaluator. Defaults to MetricGroup.CONTENT_SAFETY.get_metrics(). :type metrics: list[GenAIMetric], optional

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

  1. Basic usage
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_content_safety
    def agentic_tool(*args, *kwargs):
        pass
    
  2. Usage with different thresholds and methods for some of the metrics in the group
    metric_1 = PIIMetric(thresholds=MetricThreshold(type="lower_limit", value=0.5))
    metric_2 = HAPMetric(thresholds=MetricThreshold(type="lower_limit", value=0.5))
    
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_content_safety(metrics=[metric_1, metric_2])
    def agentic_tool(*args, *kwargs):
        pass
    
evaluate_context_relevance(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) dict

An evaluation decorator for computing context relevance metric on an agentic node.

For more details, see ibm_watsonx_gov.metrics.ContextRelevanceMetric

Parameters:
  • func (Optional[Callable], optional) – The node on which the metric is to be computed.

  • configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.

  • metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to [ ContextRelevanceMetric() ].

  • compute_real_time (Optional[bool], optional) – The flag to indicate whether the metric should be computed along with the node execution or not.

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped node.

Return type:

dict

Examples

  1. Basic usage
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_context_relevance
    def agentic_node(*args, *kwargs):
        pass
    
  2. Usage with different thresholds and methods
    metric_1 = ContextRelevanceMetric(
        method="sentence_bert_bge", thresholds=MetricThreshold(type="lower_limit", value=0.5))
    metric_2 = ContextRelevanceMetric(
        method="sentence_bert_mini_lm", thresholds=MetricThreshold(type="lower_limit", value=0.6))
    metric_3 = ContextRelevanceMetric(
        method="granite_guardian", thresholds=MetricThreshold(type="lower_limit", value=0.6))
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_context_relevance(metrics=[metric_1, metric_2, metric_3])
    def agentic_node(*args, *kwargs):
        pass
    
evaluate_evasiveness(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) dict

An evaluation decorator for computing evasiveness on an agentic tool via granite guardian.

For more details, see ibm_watsonx_gov.metrics.EvasivenessMetric

Parameters:
  • func (Optional[Callable], optional) – The tool on which the metric is to be computed.

  • configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.

  • metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to [ ProfanityMetric() ]

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

  1. Create evaluate_evasiveness decorator with default parameters. By default, the metric uses the “input_text” from the graph state as the input.
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_evasiveness
    def agentic_tool(*args, *kwargs):
        pass
    
  2. Create evaluate_evasiveness decorator with thresholds and configuration
    metric = EvasivenessMetric(thresholds=MetricThreshold(type="lower_limit", value=0.7))
    config = {"input_fields": ["input"]}
    configuration = AgenticAIConfiguration(**config)
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_evasiveness(metrics=[metric], configuration=configuration)
    def agentic_tool(*args, *kwargs):
        pass
    
evaluate_faithfulness(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) dict

An evaluation decorator for computing faithfulness metric on an agentic node.

For more details, see ibm_watsonx_gov.metrics.FaithfulnessMetric

Parameters:
  • func (Optional[Callable], optional) – The node on which the metric is to be computed.

  • configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.

  • metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to [ FaithfulnessMetric() ].

  • compute_real_time (Optional[bool], optional) – The flag to indicate whether the metric should be computed along with the node execution or not.

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped node.

Return type:

dict

Examples

  1. Basic usage
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_faithfulness
    def agentic_node(*args, *kwargs):
        pass
    
  2. Usage with different thresholds and methods
    metric_1 = FaithfulnessMetric(method="token_k_precision", threshold=MetricThreshold(type="lower_limit", value=0.5))
    metric_2 = FaithfulnessMetric(method="sentence_bert_mini_lm", threshold=MetricThreshold(type="lower_limit", value=0.6))
    
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_faithfulness(metrics=[metric_1, metric_2])
    def agentic_node(*args, *kwargs):
        pass
    
evaluate_general_quality_with_llm(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) dict

An evaluation decorator for computing llm validation metric on an agentic node.

For more details, see ibm_watsonx_gov.metrics.LLMValidationMetric

Parameters:
  • func (Optional[Callable], optional) – The node on which the metric is to be computed.

  • configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.

  • metrics (list[GenAIMetric]) – The list of metrics to compute as part of this evaluator.

  • compute_real_time (Optional[bool], optional) – The flag to indicate whether the metric should be computed along with the node execution or not. When online is set to False, evaluate_metrics method should be invoked on the AgenticEvaluator to compute the metric.

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped node.

Return type:

dict

Examples

  1. Basic usage
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_general_quality_with_llm
    def agentic_node(*args, *kwargs):
        pass
    
evaluate_hap(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) dict

An evaluation decorator for computing HAP metric on an agentic tool.

For more details, see ibm_watsonx_gov.metrics.HAPMetric

Parameters:
  • func (Optional[Callable], optional) – The tool on which the metric is to be computed.

  • configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.

  • metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to [HAPMetric()].

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

  1. Create evaluate_hap decorator with default parameters. By default, the metric uses the “input_text” from the graph state as the input.
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_hap
    def agentic_tool(*args, *kwargs):
        pass
    
  2. Create evaluate_hap decorator with thresholds and configuration
    metric = HAPMetric(thresholds=MetricThreshold(type="lower_limit", value=0.7))
    config = {"input_fields": ["input"]}
    configuration = AgenticAIConfiguration(**config)
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_hap(metrics=[metric], configuration=configuration)
    def agentic_tool(*args, *kwargs):
        pass
    
evaluate_harm(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) dict

An evaluation decorator for computing harm risk on an agentic tool via granite guardian.

For more details, see ibm_watsonx_gov.metrics.HarmMetric

Parameters:
  • func (Optional[Callable], optional) – The tool on which the metric is to be computed.

  • configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.

  • metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to [ HarmMetric() ]

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

  1. Create evaluate_harm decorator with default parameters. By default, the metric uses the “input_text” from the graph state as the input.
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_harm
    def agentic_tool(*args, *kwargs):
        pass
    
  2. Create evaluate_harm decorator with thresholds and configuration
    metric = HarmMetric(thresholds=MetricThreshold(type="lower_limit", value=0.7))
    config = {"input_fields": ["input"]}
    configuration = AgenticAIConfiguration(**config)
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_harm(metrics=[metric], configuration=configuration)
    def agentic_tool(*args, *kwargs):
        pass
    
evaluate_harm_engagement(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) dict

An evaluation decorator for computing harm engagement on an agentic tool via granite guardian.

For more details, see ibm_watsonx_gov.metrics.HarmEngagementMetric

Parameters:
  • func (Optional[Callable], optional) – The tool on which the metric is to be computed.

  • configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.

  • metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to [ ProfanityMetric() ]

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

  1. Create evaluate_harm_engagement decorator with default parameters. By default, the metric uses the “input_text” from the graph state as the input.
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_harm_engagement
    def agentic_tool(*args, *kwargs):
        pass
    
  2. Create evaluate_harm_engagement decorator with thresholds and configuration
    metric = HarmEngagementMetric(thresholds=MetricThreshold(type="lower_limit", value=0.7))
    config = {"input_fields": ["input"]}
    configuration = AgenticAIConfiguration(**config)
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_harm_engagement(metrics=[metric], configuration=configuration)
    def agentic_tool(*args, *kwargs):
        pass
    
evaluate_hit_rate(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) dict

An evaluation decorator for computing hit rate metric on an agentic tool. This metric uses context relevance values for computation, context relevance metric would be computed as a prerequisite.

For more details, see ibm_watsonx_gov.metrics.HitRateMetric

Parameters:
  • func (Optional[Callable], optional) – The tool on which the metric is to be computed.

  • configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.

  • metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to [ HitRateMetric() ].

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

  1. Basic usage
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_hit_rate
    def agentic_tool(*args, *kwargs):
        pass
    
  2. Usage with different thresholds and methods
    metric_1 = HitRateMetric(threshold=MetricThreshold(type="lower_limit", value=0.5))
    metric_2 = ContextRelevanceMetric(method="sentence_bert_mini_lm", threshold=MetricThreshold(type="lower_limit", value=0.6))
    
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_hit_rate(metrics=[metric_1, metric_2])
    def agentic_tool(*args, *kwargs):
        pass
    
evaluate_jailbreak(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) dict

An evaluation decorator for computing jailbreak on an agentic tool via granite guardian.

For more details, see ibm_watsonx_gov.metrics.JailbreakMetric

Parameters:
  • func (Optional[Callable], optional) – The tool on which the metric is to be computed.

  • configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.

  • metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to [ ProfanityMetric() ]

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

  1. Create evaluate_jailbreak decorator with default parameters. By default, the metric uses the “input_text” from the graph state as the input.
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_jailbreak
    def agentic_tool(*args, *kwargs):
        pass
    
  2. Create evaluate_jailbreak decorator with thresholds and configuration
    metric = JailbreakMetric(thresholds=MetricThreshold(type="lower_limit", value=0.7))
    config = {"input_fields": ["input"]}
    configuration = AgenticAIConfiguration(**config)
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_jailbreak(metrics=[metric], configuration=configuration)
    def agentic_tool(*args, *kwargs):
        pass
    
evaluate_ndcg(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) dict

An evaluation decorator for computing ndcg metric on an agentic tool. This metric uses context relevance values for computation, context relevance metric would be computed as a prerequisite.

For more details, see ibm_watsonx_gov.metrics.NDCGMetric

Parameters:
  • func (Optional[Callable], optional) – The tool on which the metric is to be computed.

  • configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.

  • metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to [ NDCGMetric() ].

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

  1. Basic usage
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_ndcg
    def agentic_tool(*args, *kwargs):
        pass
    
  2. Usage with different thresholds and methods
    metric_1 = NDCGMetric(threshold=MetricThreshold(type="lower_limit", value=0.5))
    metric_2 = ContextRelevanceMetric(method="sentence_bert_mini_lm", threshold=MetricThreshold(type="lower_limit", value=0.6))
    
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_ndcg(metrics=[metric_1, metric_2])
    def agentic_tool(*args, *kwargs):
        pass
    
evaluate_pii(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) dict

An evaluation decorator for computing PII metric on an agentic tool.

For more details, see ibm_watsonx_gov.metrics.PIIMetric

Parameters:
  • func (Optional[Callable], optional) – The tool on which the metric is to be computed.

  • configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.

  • metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to [PIIMetric()].

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

  1. Create evaluate_pii decorator with default parameters. By default, the metric uses the “input_text” from the graph state as the input.
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_pii
    def agentic_tool(*args, *kwargs):
        pass
    
  2. Create evaluate_pii decorator with thresholds and configuration
    metric = PIIMetric(thresholds=MetricThreshold(type="lower_limit", value=0.7))
    config = {"input_fields": ["input"]}
    configuration = AgenticAIConfiguration(**config)
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_pii(metrics=[metric], configuration=configuration)
    def agentic_tool(*args, *kwargs):
        pass
    
evaluate_profanity(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) dict

An evaluation decorator for computing profanity on an agentic tool via granite guardian.

For more details, see ibm_watsonx_gov.metrics.ProfanityMetric

Parameters:
  • func (Optional[Callable], optional) – The tool on which the metric is to be computed.

  • configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.

  • metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to [ ProfanityMetric() ]

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

  1. Create evaluate_profanity decorator with default parameters. By default, the metric uses the “input_text” from the graph state as the input.
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_profanity
    def agentic_tool(*args, *kwargs):
        pass
    
  2. Create evaluate_profanity decorator with thresholds and configuration
    metric = ProfanityMetric(thresholds=MetricThreshold(type="lower_limit", value=0.7))
    config = {"input_fields": ["input"]}
    configuration = AgenticAIConfiguration(**config)
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_profanity(metrics=[metric], configuration=configuration)
    def agentic_tool(*args, *kwargs):
        pass
    
evaluate_prompt_safety_risk(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric], compute_real_time: bool | None = True) dict

An evaluation decorator for computing prompt safety risk metric on an agentic tool.

For more details, see ibm_watsonx_gov.metrics.PromptSafetyRiskMetric

Parameters:
  • func (Optional[Callable], optional) – The tool on which the metric is to be computed.

  • configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.

  • metrics (list[GenAIMetric]) – The list of metrics to compute as part of this evaluator.

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

  1. Create evaluate_prompt_safety_risk decorator with default parameters. By default, the metric uses the “input_text” from the graph state as the input.
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_prompt_safety_risk(metrics=[PromptSafetyRiskMetric(system_prompt="...")])
    def agentic_tool(*args, *kwargs):
        pass
    
  2. Create evaluate_prompt_safety_risk decorator with thresholds and configuration
    metric = PromptSafetyRiskMetric(system_prompt="...", thresholds=MetricThreshold(type="lower_limit", value=0.7))
    config = {"input_fields": ["input"]}
    configuration = AgenticAIConfiguration(**config)
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_prompt_safety_risk(metrics=[metric], configuration=configuration)
    def agentic_tool(*args, *kwargs):
        pass
    
evaluate_readability(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) dict

An evaluation decorator for computing answer readability metrics on an agentic tool. Readability metrics include TextReadingEaseMetric and TextGradeLevelMetric

For more details, see ibm_watsonx_gov.metrics.TextReadingEaseMetric, ibm_watsonx_gov.metrics.TextGradeLevelMetric

Parameters:
  • func (Optional[Callable], optional) – The tool on which the metric is to be computed.

  • configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.

  • metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to MetricGroup.READABILITY.get_metrics().

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

  1. Basic usage
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_readability
    def agentic_tool(*args, *kwargs):
        pass
    
  2. Usage with different thresholds and methods for some of the metrics in the group
    metric_1 = TextGradeLevelMetric(thresholds=[MetricThreshold(type="lower_limit", value=6)])
    metric_2 = TextReadingEaseMetric(thresholds=[MetricThreshold(type="lower_limit", value=70)])
    config = {"output_fields": ["generated_text"]}
    configuration = AgenticAIConfiguration(**config)
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_readability(metrics=[metric_1, metric_2], configuration=configuration)
    def agentic_tool(*args, *kwargs):
        pass
    
evaluate_reciprocal_rank(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) dict

An evaluation decorator for computing reciprocal precision metric on an agentic tool. This metric uses context relevance values for computation, context relevance metric would be computed as a prerequisite.

For more details, see ibm_watsonx_gov.metrics.ReciprocalRankMetric

Parameters:
  • func (Optional[Callable], optional) – The tool on which the metric is to be computed.

  • configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.

  • metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to [ ReciprocalRankMetric() ].

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

  1. Basic usage
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_reciprocal_rank
    def agentic_tool(*args, *kwargs):
        pass
    
  2. Usage with different thresholds and methods
    metric_1 = ReciprocalRankMetric(threshold=MetricThreshold(type="lower_limit", value=0.5))
    metric_2 = ContextRelevanceMetric(method="sentence_bert_mini_lm", threshold=MetricThreshold(type="lower_limit", value=0.6))
    
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_reciprocal_rank(metrics=[metric_1, metric_2])
    def agentic_tool(*args, *kwargs):
        pass
    
evaluate_retrieval_precision(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) dict

An evaluation decorator for computing retrieval precision metric on an agentic tool. This metric uses context relevance values for computation, context relevance metric would be computed as a prerequisite.

For more details, see ibm_watsonx_gov.metrics.RetrievalPrecisionMetric

Parameters:
  • func (Optional[Callable], optional) – The tool on which the metric is to be computed.

  • configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.

  • metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to [ RetrievalPrecisionMetric() ].

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

  1. Basic usage
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_retrieval_precision
    def agentic_tool(*args, *kwargs):
        pass
    
  2. Usage with different thresholds and methods
    metric_1 = AveragePrecisionMetric(threshold=MetricThreshold(type="lower_limit", value=0.5))
    metric_2 = ContextRelevanceMetric(method="sentence_bert_mini_lm", threshold=MetricThreshold(type="lower_limit", value=0.6))
    
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_retrieval_precision(metrics=[metric_1, metric_2])
    def agentic_tool(*args, *kwargs):
        pass
    
evaluate_retrieval_quality(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) dict

An evaluation decorator for computing retrieval quality metrics on an agentic tool. Retrieval Quality metrics include Context Relevance, Retrieval Precision, Average Precision, Hit Rate, Reciprocal Rank, NDCG

For more details, see ibm_watsonx_gov.metrics.ContextRelevanceMetric, ibm_watsonx_gov.metrics.RetrievalPrecisionMetric, ibm_watsonx_gov.metrics.AveragePrecisionMetric, ibm_watsonx_gov.metrics.ReciprocalRankMetric, ibm_watsonx_gov.metrics.HitRateMetric, ibm_watsonx_gov.metrics.NDCGMetric

Parameters:
  • func (Optional[Callable], optional) – The tool on which the metric is to be computed.

  • configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.

  • metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to MetricGroup.RETRIEVAL_QUALITY.get_metrics().

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

  1. Basic usage
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_retrieval_quality
    def agentic_tool(*args, *kwargs):
        pass
    
  2. Usage with different thresholds and methods for some of the metrics in the group
    metric_1 = NDCGMetric(threshold=MetricThreshold(type="lower_limit", value=0.5))
    metric_2 = ContextRelevanceMetric(method="sentence_bert_mini_lm", threshold=MetricThreshold(type="lower_limit", value=0.6))
    
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_retrieval_quality(metrics=[metric_1, metric_2])
    def agentic_tool(*args, *kwargs):
        pass
    
evaluate_sexual_content(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) dict

An evaluation decorator for computing sexual content on an agentic tool via granite guardian.

For more details, see ibm_watsonx_gov.metrics.SexualContentMetric

Parameters:
  • func (Optional[Callable], optional) – The tool on which the metric is to be computed.

  • configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.

  • metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to [ ProfanityMetric() ]

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

  1. Create evaluate_sexual_content decorator with default parameters. By default, the metric uses the “input_text” from the graph state as the input.
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_sexual_content
    def agentic_tool(*args, *kwargs):
        pass
    
  2. Create evaluate_sexual_content decorator with thresholds and configuration
    metric = SexualContentMetric(thresholds=MetricThreshold(type="lower_limit", value=0.7))
    config = {"input_fields": ["input"]}
    configuration = AgenticAIConfiguration(**config)
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_sexual_content(metrics=[metric], configuration=configuration)
    def agentic_tool(*args, *kwargs):
        pass
    
evaluate_social_bias(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) dict

An evaluation decorator for computing social bias on an agentic tool via granite guardian.

For more details, see ibm_watsonx_gov.metrics.SocialBiasMetric

Parameters:
  • func (Optional[Callable], optional) – The tool on which the metric is to be computed.

  • configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.

  • metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to [ HarmMetric() ]

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

  1. Create evaluate_social_bias decorator with default parameters. By default, the metric uses the “input_text” from the graph state as the input.
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_social_bias
    def agentic_tool(*args, *kwargs):
        pass
    
  2. Create evaluate_social_bias decorator with thresholds and configuration
    metric = SocialBiasMetric(thresholds=MetricThreshold(type="lower_limit", value=0.7))
    config = {"input_fields": ["input"]}
    configuration = AgenticAIConfiguration(**config)
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_social_bias(metrics=[metric], configuration=configuration)
    def agentic_tool(*args, *kwargs):
        pass
    
evaluate_text_grade_level(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) dict

An evaluation decorator for computing text grade level metric on an agentic tool.

For more details, see ibm_watsonx_gov.metrics.TextGradeLevelMetric

Parameters:
  • func (Optional[Callable], optional) – The tool on which the metric is to be computed.

  • configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.

  • metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to [TextGradeLevelMetric()].

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

  1. Basic usage
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_text_grade_level
    def agentic_tool(*args, *kwargs):
        pass
    
  2. Create evaluate_text_grade_level decorator with thresholds and configuration
    metric = TextGradeLevelMetric(thresholds=[MetricThreshold(type="lower_limit", value=6)])
    config = {"output_fields": ["generated_text"]}
    configuration = AgenticAIConfiguration(**config)
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_text_grade_level(metrics=[metric], configuration=configuration)
    def agentic_tool(*args, *kwargs):
        pass
    
evaluate_text_reading_ease(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) dict

An evaluation decorator for computing text reading ease ease metric on an agentic tool.

For more details, see ibm_watsonx_gov.metrics.TextReadingEaseMetric

Parameters:
  • func (Optional[Callable], optional) – The tool on which the metric is to be computed.

  • configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.

  • metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to [TextReadingEaseMetric()].

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

  1. Basic usage
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_text_reading_ease
    def agentic_tool(*args, *kwargs):
        pass
    
  2. Create evaluate_text_reading_ease decorator with thresholds and configuration
    metric = TextReadingEaseMetric(thresholds=[MetricThreshold(type="lower_limit", value=70)])
    config = {"output_fields": ["generated_text"]}
    configuration = AgenticAIConfiguration(**config)
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_text_reading_ease(metrics=[metric], configuration=configuration)
    def agentic_tool(*args, *kwargs):
        pass
    
evaluate_tool_call_accuracy(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) dict

An evaluation decorator for computing tool_call_accuracy metric on an agent tool.

For more details, see ibm_watsonx_gov.metrics.ToolCallAccuracyMetric

Parameters:
  • func (Optional[Callable], optional) – The tool on which the metric is to be computed.

  • configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.

  • metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to [ ToolCallAccuracyMetric() ].

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

  1. Basic usage
    evaluator = AgenticEvaluator()
    tool_call_metric_config={
        "tools":[get_weather, fetch_stock_price], # List of tools available to the agent
    }
    @evaluator.evaluate_tool_call_accuracy(configuration=AgenticAIConfiguration(**tool_call_metric_config))
    def agentic_tool(*args, *kwargs):
        pass
    
  2. Usage with custom tool calls field
    evaluator = AgenticEvaluator()
    tool_call_metric_config={
        "tools":[get_weather, fetch_stock_price], # List of tools available to the agent
        "tool_calls_field": "tool_calls" # Graph state field to store the Agent's response/tool calls
    }
    @evaluator.evaluate_tool_call_syntactic_accuracy(configuration=AgenticAIConfiguration(**tool_call_metric_config))
    def agentic_tool(*args, *kwargs):
        pass
    
  3. Usage with different thresholds
    metric_1 = ToolCallAccuracyMetric(threshold=MetricThreshold(type="upper_limit", value=0.7))
    metric_2 = ToolCallAccuracyMetric(threshold=MetricThreshold(type="upper_limit", value=0.9))
    evaluator = AgenticEvaluator()
    tool_call_metric_config={
        "tools":[get_weather, fetch_stock_price], # List of tools available to the agent
        "tool_calls_field": "tool_calls" # Graph state field to store the Agent's response/tool calls
    }
    @evaluator.evaluate_tool_call_accuracy(configuration=AgenticAIConfiguration(**tool_call_metric_config),metrics=[metric_1, metric_2])
    def agentic_tool(*args, *kwargs):
        pass
    
  4. Usage with a list of dictionary items as tools
evaluate_tool_call_parameter_accuracy(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) dict

An evaluation decorator for computing tool_call_parameter_accuracy metric on an agentic tool.

For more details, see ibm_watsonx_gov.metrics.ToolCallParameterAccuracyMetric

Parameters:
  • func (Optional[Callable], optional) – The tool on which the metric is to be computed.

  • configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.

  • metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to [ ToolCallParameterAccuracyMetric() ].

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

  1. Basic usage
    evaluator = AgenticEvaluator()
    tool_calls_metric_config={
        "tools":[get_weather, fetch_stock_price], # List of tools available to the agent
    }
    llm_judge = LLMJudge(
        model=WxAIFoundationModel(
            model_id="meta-llama/llama-3-3-70b-instruct",
            project_id=os.getenv("WATSONX_PROJECT_ID"),
        )
    )
    metric_1 = ToolCallParameterAccuracyMetric(llm_judge=llm_judge)
    @evaluator.evaluate_tool_call_parameter_accuracy(configuration=AgenticAIConfiguration(**tool_calls_metric_config), metrics=[metric_1])
    def agentic_tool(*args, *kwargs):
        pass
    
  2. Usage with custom tool calls field
    evaluator = AgenticEvaluator()
    tool_calls_metric_config={
        "tools":[get_weather, fetch_stock_price], # List of tools available to the agent
        "tool_calls_field": "tool_calls" # Graph state field to store the Agent's response/tool calls
    }
    llm_judge = LLMJudge(
        model=WxAIFoundationModel(
            model_id="meta-llama/llama-3-3-70b-instruct",
            project_id=os.getenv("WATSONX_PROJECT_ID"),
        )
    )
    metric_1 = ToolCallParameterAccuracyMetric(llm_judge=llm_judge)
    @evaluator.evaluate_tool_call_parameter_accuracy(configuration=AgenticAIConfiguration(**tool_calls_metric_config), metrics=[metric_1])
    def agentic_tool(*args, *kwargs):
        pass
    
  3. Usage with different thresholds
    llm_judge = LLMJudge(
        model=WxAIFoundationModel(
            model_id="meta-llama/llama-3-3-70b-instruct",
            project_id=os.getenv("WATSONX_PROJECT_ID"),
        )
    )
    metric_1 = ToolCallParameterAccuracyMetric(llm_judge=llm_judge, threshold=MetricThreshold(type="upper_limit", value=0.7))
    evaluator = AgenticEvaluator()
    tool_calls_metric_config={
        "tools":[get_weather, fetch_stock_price], # List of tools available to the agent
        "tool_calls_field": "tool_calls" # Graph state field to store the Agent's response/tool calls
    }
    @evaluator.evaluate_tool_call_parameter_accuracy(configuration=AgenticAIConfiguration(**tool_calls_metric_config),metrics=[metric_1])
    def agentic_tool(*args, *kwargs):
        pass
    
evaluate_tool_call_relevance(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) dict

An evaluation decorator for computing tool_call_relevance metric on an agent tool.

For more details, see ibm_watsonx_gov.metrics.ToolCallRelevanceMetric

Parameters:
  • func (Optional[Callable], optional) – The tool on which the metric is to be computed.

  • configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.

  • metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to [ ToolCallRelevanceMetric() ].

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

  1. Basic usage
    evaluator = AgenticEvaluator()
    tool_call_relevance_config={
        "tools":[get_weather, fetch_stock_price], # List of tools available to the agent
    }
    llm_judge = LLMJudge(
        model=WxAIFoundationModel(
            model_id="meta-llama/llama-3-3-70b-instruct",
            project_id=os.getenv("WATSONX_PROJECT_ID"),
        )
    )
    metric_1 = ToolCallRelevanceMetric(llm_judge=llm_judge)
    @evaluator.evaluate_tool_call_relevance(configuration=AgenticAIConfiguration(**tool_call_relevance_config), metrics=[metric_1])
    def agentic_tool(*args, *kwargs):
        pass
    
  2. Usage with custom tool calls field
    evaluator = AgenticEvaluator()
    tool_call_relevance_config={
        "tools":[get_weather, fetch_stock_price], # List of tools available to the agent
        "tool_calls_field": "tool_calls" # Graph state field to store the Agent's response/tool calls
    }
    llm_judge = LLMJudge(
        model=WxAIFoundationModel(
            model_id="meta-llama/llama-3-3-70b-instruct",
            project_id=os.getenv("WATSONX_PROJECT_ID"),
        )
    )
    metric_1 = ToolCallRelevanceMetric(llm_judge=llm_judge)
    @evaluator.evaluate_tool_call_relevance(configuration=AgenticAIConfiguration(**tool_call_relevance_config), metrics=[metric_1])
    def agentic_tool(*args, *kwargs):
        pass
    
  3. Usage with different thresholds
    llm_judge = LLMJudge(
        model=WxAIFoundationModel(
            model_id="meta-llama/llama-3-3-70b-instruct",
            project_id=os.getenv("WATSONX_PROJECT_ID"),
        )
    )
    metric_1 = ToolCallRelevanceMetric(llm_judge=llm_judge, threshold=MetricThreshold(type="upper_limit", value=0.7))
    evaluator = AgenticEvaluator()
    tool_call_relevance_config={
        "tools":[get_weather, fetch_stock_price], # List of tools available to the agent
        "tool_calls_field": "tool_calls" # Graph state field to store the Agent's response/tool calls
    }
    @evaluator.evaluate_tool_call_relevance(configuration=AgenticAIConfiguration(**tool_call_relevance_config),metrics=[metric_1])
    def agentic_tool(*args, *kwargs):
        pass
    
evaluate_tool_call_syntactic_accuracy(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) dict

An evaluation decorator for computing tool_call_syntactic_accuracy metric on an agent tool.

For more details, see ibm_watsonx_gov.metrics.ToolCallSyntacticAccuracyMetric

Parameters:
  • func (Optional[Callable], optional) – The tool on which the metric is to be computed.

  • configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.

  • metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to [ ToolCallSyntacticAccuracyMetric() ].

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

  1. Basic usage
    evaluator = AgenticEvaluator()
    tool_call_syntactic_metric_config={
        "tools":[get_weather, fetch_stock_price], # List of tools available to the agent
    }
    @evaluator.evaluate_tool_call_syntactic_accuracy(configuration=AgenticAIConfiguration(**tool_call_syntactic_metric_config))
    def agentic_tool(*args, *kwargs):
        pass
    
  2. Usage with custom tool calls field
    evaluator = AgenticEvaluator()
    tool_call_syntactic_metric_config={
        "tools":[get_weather, fetch_stock_price], # List of tools available to the agent
        "tool_calls_field": "tool_calls" # Graph state field to store the Agent's response/tool calls
    }
    @evaluator.evaluate_tool_call_syntactic_accuracy(configuration=AgenticAIConfiguration(**tool_call_syntactic_metric_config))
    def agentic_tool(*args, *kwargs):
        pass
    
  3. Usage with different thresholds
    metric_1 = ToolCallSyntacticAccuracyMetric(threshold=MetricThreshold(type="upper_limit", value=0.7))
    evaluator = AgenticEvaluator()
    tool_call_syntactic_metric_config={
        "tools":[get_weather, fetch_stock_price], # List of tools available to the agent
        "tool_calls_field": "tool_calls" # Graph state field to store the Agent's response/tool calls
    }
    @evaluator.evaluate_tool_call_syntactic_accuracy(configuration=AgenticAIConfiguration(**tool_call_syntactic_metric_config),metrics=[metric_1])
    def agentic_tool(*args, *kwargs):
        pass
    
evaluate_topic_relevance(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric], compute_real_time: bool | None = True) dict

An evaluation decorator for computing topic relevance on an agentic tool via off-topic detector.

For more details, see ibm_watsonx_gov.metrics.TopicRelevanceMetric

Parameters:
  • func (Optional[Callable], optional) – The tool on which the metric is to be computed.

  • configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.

  • metrics (list[GenAIMetric]) – The list of metrics to compute as part of this evaluator.

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

  1. Create evaluate_topic_relevance decorator with default parameters. By default, the metric uses the “input_text” from the graph state as the input.
    metric = TopicRelevanceMetric(system_prompt="...")
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_topic_relevance(metrics=[metric])
    def agentic_tool(*args, *kwargs):
        pass
    
  2. Create evaluate_topic_relevance decorator with thresholds and configuration
    metric = TopicRelevanceMetric(system_prompt="...", thresholds=MetricThreshold(type="lower_limit", value=0.7))
    evaluator = AgenticEvaluator()
    config = {"input_fields": ["input"]}
    configuration = AgenticAIConfiguration(**config)
    @evaluator.evaluate_topic_relevance(metrics=[metric], configuration=configuration)
    def agentic_tool(*args, *kwargs):
        pass
    
evaluate_unethical_behavior(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) dict

An evaluation decorator for computing unethical behavior on an agentic tool via granite guardian.

For more details, see ibm_watsonx_gov.metrics.UnethicalBehaviorMetric

Parameters:
  • func (Optional[Callable], optional) – The tool on which the metric is to be computed.

  • configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.

  • metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to [ ProfanityMetric() ]

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

  1. Create evaluate_unethical_behavior decorator with default parameters. By default, the metric uses the “input_text” from the graph state as the input.
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_unethical_behavior
    def agentic_tool(*args, *kwargs):
        pass
    
  2. Create evaluate_unethical_behavior decorator with thresholds and configuration
    metric = UnethicalBehaviorMetric(thresholds=MetricThreshold(type="lower_limit", value=0.7))
    config = {"input_fields": ["input"]}
    configuration = AgenticAIConfiguration(**config)
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_unethical_behavior(metrics=[metric], configuration=configuration)
    def agentic_tool(*args, *kwargs):
        pass
    
evaluate_unsuccessful_requests(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) dict

An evaluation decorator for computing unsuccessful requests metric on an agentic tool.

For more details, see ibm_watsonx_gov.metrics.UnsuccessfulRequestsMetric

Parameters:
  • func (Optional[Callable], optional) – The tool on which the metric is to be computed.

  • configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.

  • metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to [ UnsuccessfulRequestsMetric() ].

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

  1. Basic usage
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_unsuccessful_requests
    def agentic_tool(*args, *kwargs):
        pass
    
  2. Usage with different thresholds and methods
    metric_1 = UnsuccessfulRequestsMetric(threshold=MetricThreshold(type="lower_limit", value=0.5))
    
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_unsuccessful_requests(metrics=[metric_1])
    def agentic_tool(*args, *kwargs):
        pass
    
evaluate_violence(func: Callable | None = None, *, configuration: AgenticAIConfiguration | None = None, metrics: list[GenAIMetric] = [], compute_real_time: bool | None = True) dict

An evaluation decorator for computing violence on an agentic tool via granite guardian.

For more details, see ibm_watsonx_gov.metrics.ViolenceMetric

Parameters:
  • func (Optional[Callable], optional) – The tool on which the metric is to be computed.

  • configuration (Optional[AgenticAIConfiguration], optional) – The configuration specific to this evaluator. Defaults to None.

  • metrics (list[GenAIMetric], optional) – The list of metrics to compute as part of this evaluator. Defaults to [ ProfanityMetric() ]

Raises:

Exception – If there is any error while evaluation.

Returns:

The result of the wrapped tool.

Return type:

dict

Example

  1. Create evaluate_violence decorator with default parameters. By default, the metric uses the “input_text” from the graph state as the input.
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_violence
    def agentic_tool(*args, *kwargs):
        pass
    
  2. Create evaluate_violence decorator with thresholds and configuration
    metric = ViolenceMetric(thresholds=MetricThreshold(type="lower_limit", value=0.7))
    config = {"input_fields": ["input"]}
    configuration = AgenticAIConfiguration(**config)
    evaluator = AgenticEvaluator()
    @evaluator.evaluate_violence(metrics=[metric], configuration=configuration)
    def agentic_tool(*args, *kwargs):
        pass
    
get_metric_result(metric_name: str, node_name: str) AgentMetricResult

Get the AgentMetricResult for the given metric and node name. This is used to get the result of the metric computed during agent execution.

Parameters:
  • metric_name (string) – The metric name

  • node_name (string) – The node name

Returns:

The AgentMetricResult object for the metric.

Return type:

agent_metric_result (AgentMetricResult)

get_nodes() list[Node]

Get the list of nodes used in the agentic application

Returns:

The list of nodes used in the agentic application

Return type:

nodes (list[Node])

get_result(run_name: str | None = None) AgenticEvaluationResult

Get the AgenticEvaluationResult for the run. By default the result for the latest run is returned. Specify the run name to get the result for a specific run. :param run_name: The evaluation run name :type run_name: string

Returns:

The AgenticEvaluationResult object for the run.

Return type:

agentic_evaluation_result (AgenticEvaluationResult)

log_custom_metrics(custom_metrics)

Collect the custom metrics provided by user and append with metrics of current run.

Parameters:

custom_metrics (List[Dict]) – custom metrics

model_post_init(context: Any, /) None

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

start_run(run_request: AIExperimentRunRequest = AIExperimentRunRequest(name='run_1', description='', source_name='', source_url='', custom_tags=[], agent_method_name='')) AIExperimentRun

Start a run to track the metrics computation within an experiment. This method is required to be called before any metrics computation.

Parameters:

run_request (AIExperimentRunRequest) – The run_request instance containing name, source_name, source_url, custom_tags

Returns:

The details of experiment run like id, name, description etc.

track_experiment(name: str = 'experiment_1', description: str = None, use_existing: bool = True) str

Start tracking an experiment for the metrics evaluation. The experiment will be created if it doesn’t exist. If an existing experiment with the same name is found, it will be reused based on the flag use_existing.

Parameters:
  • project_id (string) – The project id to store the experiment.

  • name (string) – The name of the experiment.

  • description (str) – The description of the experiment.

  • use_existing (bool) – The flag to specify if the experiment should be reused if an existing experiment with the given name is found.

Returns:

The ID of AI experiment asset