Tool Call Accuracy Metric

pydantic model ibm_watsonx_gov.metrics.tool_call_accuracy.tool_call_accuracy_metric.ToolCallAccuracyMetric

Bases: GenAIMetric

ToolCallAccuracyMetric checks whether the tool call in the LLM response is syntactically correct and semantically meaningful, given the user’s query and the available tool definitions.

The ToolCallAccuracyMetric will be computed using granite_guardian.

Examples

  1. Create ToolCallAccuracyMetric by passing the basic configuration.
    config = GenAIConfiguration(tools = [get_weather,fetch_stock_price])
    evaluator = MetricsEvaluator(configuration=config)
    df = pd.read_csv("")
    
    metrics = [ToolCallAccuracyMetric()]
    result = evaluator.evaluate(data=df, metrics=metrics)
    
  2. Create ToolCallAccuracyMetric with a custom threshold.
    threshold  = MetricThreshold(type="upper_limit", value=0.8)
    metric = ToolCallAccuracyMetric(threshold=threshold)
    
  3. Create ToolCallAccuracyMetric by passing custom tool calls field in configuration.
    test_data = {"input_text": "What's the latest on Tesla today?",
    "tools_used":[{"name": "get_weather", "args": {"location": "Tesla"}, "id": "0724", "type": "tool_call"}]}
    
    config = GenAIConfiguration(tools = [get_weather,fetch_stock_price],
                                tool_calls_field="tools_used")
    evaluator = MetricsEvaluator(configuration=config)
    metrics = [ToolCallAccuracyMetric()]
    result = evaluator.evaluate(data=test_data, metrics=metrics)
    
  4. Create ToolCallAccuracyMetric by passing a list of dictionary items as tools field in configuration.

Show JSON schema
{
   "title": "ToolCallAccuracyMetric",
   "description": "ToolCallAccuracyMetric checks whether the tool call in the LLM response is \nsyntactically correct and semantically meaningful, given the user's query and \nthe available tool definitions.\n\nThe ToolCallAccuracyMetric will be computed using granite_guardian.\n\nExamples:\n    1. Create ToolCallAccuracyMetric by passing the basic configuration.\n        .. code-block:: python\n\n            config = GenAIConfiguration(tools = [get_weather,fetch_stock_price])\n            evaluator = MetricsEvaluator(configuration=config)\n            df = pd.read_csv(\"\")\n\n            metrics = [ToolCallAccuracyMetric()]\n            result = evaluator.evaluate(data=df, metrics=metrics)\n\n    2. Create ToolCallAccuracyMetric with a custom threshold.\n        .. code-block:: python\n\n            threshold  = MetricThreshold(type=\"upper_limit\", value=0.8)\n            metric = ToolCallAccuracyMetric(threshold=threshold)\n\n    3. Create ToolCallAccuracyMetric by passing custom tool calls field in configuration.\n        .. code-block:: python\n\n            test_data = {\"input_text\": \"What's the latest on Tesla today?\", \n            \"tools_used\":[{\"name\": \"get_weather\", \"args\": {\"location\": \"Tesla\"}, \"id\": \"0724\", \"type\": \"tool_call\"}]}\n\n            config = GenAIConfiguration(tools = [get_weather,fetch_stock_price],\n                                        tool_calls_field=\"tools_used\")\n            evaluator = MetricsEvaluator(configuration=config)\n            metrics = [ToolCallAccuracyMetric()]\n            result = evaluator.evaluate(data=test_data, metrics=metrics)\n\n    4. Create ToolCallAccuracyMetric by passing a list of dictionary items as tools field in configuration.\n        .. code-block:: python\n            available_tools = [{\"type\":\"function\",\"function\":{\"name\":\"f1_name\",\"description\":\"f1_description.\",\"parameters\":{\"parameter1\":{\"description\":\"parameter_description\",\"type\":\"parameter_type\",\"default\":\"default_value\"}}}}]\n            config = GenAIConfiguration(tools = available_tools,\n                                        tool_calls_field=\"tools_used\")\n            evaluator = MetricsEvaluator(configuration=config)\n            df = pd.read_csv(\"\")\n\n            metrics = [ToolCallAccuracyMetric()]\n            result = evaluator.evaluate(data=df, metrics=metrics)",
   "type": "object",
   "properties": {
      "name": {
         "const": "tool_call_accuracy",
         "default": "tool_call_accuracy",
         "description": "The name of metric.",
         "title": "Metric Name",
         "type": "string"
      },
      "thresholds": {
         "default": [
            {
               "type": "upper_limit",
               "value": 0.7
            }
         ],
         "description": "Value that defines the violation limit for the metric",
         "items": {
            "$ref": "#/$defs/MetricThreshold"
         },
         "title": "Metric threshold",
         "type": "array"
      },
      "tasks": {
         "default": [
            "retrieval_augmented_generation"
         ],
         "description": "The generative task type.",
         "items": {
            "$ref": "#/$defs/TaskType"
         },
         "title": "Task Type",
         "type": "array"
      },
      "group": {
         "$ref": "#/$defs/MetricGroup",
         "default": "tool_call_quality"
      },
      "is_reference_free": {
         "default": true,
         "description": "Decides whether this metric needs a reference for computation",
         "title": "Is Reference Free",
         "type": "boolean"
      },
      "method": {
         "default": "syntactic",
         "description": "The method used to compute the metric.",
         "enum": [
            "syntactic",
            "granite_guardian"
         ],
         "title": "Computation Method",
         "type": "string"
      },
      "metric_dependencies": {
         "default": [],
         "description": "Metrics that needs to be evaluated first",
         "items": {
            "$ref": "#/$defs/GenAIMetric"
         },
         "title": "Metric Dependencies",
         "type": "array"
      }
   },
   "$defs": {
      "GenAIMetric": {
         "description": "Defines the Generative AI metric interface",
         "properties": {
            "name": {
               "description": "The name of the metric",
               "title": "Metric Name",
               "type": "string"
            },
            "thresholds": {
               "default": [],
               "description": "The list of thresholds",
               "items": {
                  "$ref": "#/$defs/MetricThreshold"
               },
               "title": "Thresholds",
               "type": "array"
            },
            "tasks": {
               "description": "The task types this metric is associated with.",
               "items": {
                  "$ref": "#/$defs/TaskType"
               },
               "title": "Tasks",
               "type": "array"
            },
            "group": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/MetricGroup"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The metric group this metric belongs to."
            },
            "is_reference_free": {
               "default": true,
               "description": "Decides whether this metric needs a reference for computation",
               "title": "Is Reference Free",
               "type": "boolean"
            },
            "method": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The method used to compute the metric.",
               "title": "Method"
            },
            "metric_dependencies": {
               "default": [],
               "description": "Metrics that needs to be evaluated first",
               "items": {
                  "$ref": "#/$defs/GenAIMetric"
               },
               "title": "Metric Dependencies",
               "type": "array"
            }
         },
         "required": [
            "name",
            "tasks"
         ],
         "title": "GenAIMetric",
         "type": "object"
      },
      "MetricGroup": {
         "enum": [
            "retrieval_quality",
            "answer_quality",
            "content_safety",
            "performance",
            "usage",
            "tool_call_quality",
            "readability"
         ],
         "title": "MetricGroup",
         "type": "string"
      },
      "MetricThreshold": {
         "description": "The class that defines the threshold for a metric.",
         "properties": {
            "type": {
               "description": "Threshold type. One of 'lower_limit', 'upper_limit'",
               "enum": [
                  "lower_limit",
                  "upper_limit"
               ],
               "title": "Type",
               "type": "string"
            },
            "value": {
               "default": 0,
               "description": "The value of metric threshold",
               "title": "Threshold value",
               "type": "number"
            }
         },
         "required": [
            "type"
         ],
         "title": "MetricThreshold",
         "type": "object"
      },
      "TaskType": {
         "description": "Supported task types for generative AI models",
         "enum": [
            "question_answering",
            "classification",
            "summarization",
            "generation",
            "extraction",
            "retrieval_augmented_generation"
         ],
         "title": "TaskType",
         "type": "string"
      }
   }
}

Fields:
field group: TOOL_CALL_QUALITY: 'tool_call_quality'>, frozen=True)] = MetricGroup.TOOL_CALL_QUALITY
field method: Annotated[Literal['syntactic', 'granite_guardian'], FieldInfo(annotation=NoneType, required=False, default='syntactic', title='Computation Method', description='The method used to compute the metric.')] = 'syntactic'

The method used to compute the metric.

field name: Annotated[Literal['tool_call_accuracy'], FieldInfo(annotation=NoneType, required=False, default='tool_call_accuracy', title='Metric Name', description='The name of metric.')] = 'tool_call_accuracy'

The name of metric.

field tasks: ')] = [TaskType.RAG]

The generative task type.

field thresholds: Annotated[list[MetricThreshold], FieldInfo(annotation=NoneType, required=False, default=[MetricThreshold(type='upper_limit', value=0.7)], title='Metric threshold', description='Value that defines the violation limit for the metric')] = [MetricThreshold(type='upper_limit', value=0.7)]

Value that defines the violation limit for the metric

evaluate(data: DataFrame | dict, configuration: GenAIConfiguration | AgenticAIConfiguration, **kwargs)

Evaluate the data for ToolCallAccuracyMetric :param data: Data to be evaluated :type data: pd.DataFrame | dict :param configuration: Metrics configuration :type configuration: GenAIConfiguration | AgenticAIConfiguration :param **kwargs: Additional keyword arguments

Returns:

The computed metrics

Return type:

AggregateMetricResult

async evaluate_async(data: DataFrame, configuration: GenAIConfiguration | AgenticAIConfiguration, **kwargs) list[AggregateMetricResult]
model_post_init(context: Any, /) None

We need to both initialize private attributes and call the user-defined model_post_init method.