Retrieval Precision Metric¶
- pydantic model ibm_watsonx_gov.metrics.retrieval_precision.retrieval_precision_metric.RetrievalPrecisionMetric¶
Bases:
GenAIMetric
Defines the Retrieval Precision metric class.
The Retrieval Precision metric measures the quanity of relevant contexts from the total of contexts that are retrieved. The Context Relevance metric is computed as a pre requisite to compute this metric.
Examples
- Create Retrieval Precision metric with default parameters and compute using metrics evaluator.
metric = RetrievalPrecisionMetric() result = MetricsEvaluator().evaluate(data={"input_text": "...", "context": "..."}, metrics=[metric]) # A list of contexts can also be passed as shown below result = MetricsEvaluator().evaluate(data={"input_text": "...", "context": ["...", "..."]}, metrics=[metric])
- Create Retrieval Precision metric with a custom threshold.
threshold = MetricThreshold(type="lower_limit", value=0.5) metric = RetrievalPrecisionMetric(method=method, threshold=threshold)
- Create Retrieval Precision metric with llm_as_judge method.
# Define LLM Judge using watsonx.ai # To use other frameworks and models as llm_judge, see :module:`ibm_watsonx_gov.entities.foundation_model` llm_judge = LLMJudge(model=WxAIFoundationModel( model_id="ibm/granite-3-3-8b-instruct", project_id="<PROJECT_ID>" )) cr_metric = ContextRelevanceMetric(llm_judge=llm_judge) ap_metric = RetrievalPrecisionMetric() result = MetricsEvaluator().evaluate(data={"input_text": "...", "context": ["...", "..."]}, metrics=[cr_metric, ap_metric])
Show JSON schema
{ "title": "RetrievalPrecisionMetric", "description": "Defines the Retrieval Precision metric class.\n\nThe Retrieval Precision metric measures the quanity of relevant contexts from the total of contexts that are retrieved.\nThe Context Relevance metric is computed as a pre requisite to compute this metric.\n\nExamples:\n 1. Create Retrieval Precision metric with default parameters and compute using metrics evaluator.\n .. code-block:: python\n\n metric = RetrievalPrecisionMetric()\n result = MetricsEvaluator().evaluate(data={\"input_text\": \"...\", \"context\": \"...\"},\n metrics=[metric])\n # A list of contexts can also be passed as shown below\n result = MetricsEvaluator().evaluate(data={\"input_text\": \"...\", \"context\": [\"...\", \"...\"]},\n metrics=[metric])\n\n 2. Create Retrieval Precision metric with a custom threshold.\n .. code-block:: python\n\n threshold = MetricThreshold(type=\"lower_limit\", value=0.5)\n metric = RetrievalPrecisionMetric(method=method, threshold=threshold)\n\n 3. Create Retrieval Precision metric with llm_as_judge method.\n .. code-block:: python\n\n # Define LLM Judge using watsonx.ai\n # To use other frameworks and models as llm_judge, see :module:`ibm_watsonx_gov.entities.foundation_model`\n llm_judge = LLMJudge(model=WxAIFoundationModel(\n model_id=\"ibm/granite-3-3-8b-instruct\",\n project_id=\"<PROJECT_ID>\"\n ))\n cr_metric = ContextRelevanceMetric(llm_judge=llm_judge)\n ap_metric = RetrievalPrecisionMetric()\n result = MetricsEvaluator().evaluate(data={\"input_text\": \"...\", \"context\": [\"...\", \"...\"]},\n metrics=[cr_metric, ap_metric])", "type": "object", "properties": { "name": { "const": "retrieval_precision", "default": "retrieval_precision", "description": "The retrieval precision metric name.", "title": "Name", "type": "string" }, "display_name": { "const": "Retrieval Precision", "default": "Retrieval Precision", "description": "The retrieval precision metric display name.", "title": "Display Name", "type": "string" }, "type_": { "default": "ootb", "description": "The type of the metric. Indicates whether the metric is ootb or custom.", "examples": [ "ootb", "custom" ], "title": "Metric type", "type": "string" }, "value_type": { "default": "numeric", "description": "The type of the metric value. Indicates whether the metric value is numeric or categorical.", "examples": [ "numeric", "categorical" ], "title": "Metric value type", "type": "string" }, "thresholds": { "default": [ { "type": "lower_limit", "value": 0.7 } ], "description": "The metric thresholds.", "items": { "$ref": "#/$defs/MetricThreshold" }, "title": "Thresholds", "type": "array" }, "tasks": { "default": [ "retrieval_augmented_generation" ], "description": "The list of supported tasks.", "items": { "$ref": "#/$defs/TaskType" }, "title": "Tasks", "type": "array" }, "group": { "$ref": "#/$defs/MetricGroup", "default": "retrieval_quality", "description": "The metric group.", "title": "Group" }, "is_reference_free": { "default": true, "description": "Decides whether this metric needs a reference for computation", "title": "Is Reference Free", "type": "boolean" }, "method": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "The method used to compute the metric.", "title": "Method" }, "metric_dependencies": { "default": [ { "name": "context_relevance", "display_name": "Context Relevance", "type_": "ootb", "value_type": "numeric", "thresholds": [ { "type": "lower_limit", "value": 0.7 } ], "tasks": [ "retrieval_augmented_generation" ], "group": "retrieval_quality", "is_reference_free": true, "method": "token_precision", "metric_dependencies": [], "applies_to": "message", "mapping": null, "llm_judge": null, "compute_per_context": false, "id": "context_relevance_token_precision" } ], "description": "The list of metric dependencies", "items": { "$ref": "#/$defs/GenAIMetric" }, "title": "Metric dependencies", "type": "array" }, "applies_to": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": "message", "description": "The tag to indicate for which the metric is applied to. Used for agentic application metric computation.", "examples": [ "message", "conversation", "sub_agent" ], "title": "Applies to" }, "mapping": { "anyOf": [ { "$ref": "#/$defs/Mapping" }, { "type": "null" } ], "default": null, "description": "The data mapping details for the metric which are used to read the values needed to compute the metric.", "examples": { "items": [ { "attribute_name": "traceloop.entity.input", "column_name": null, "json_path": "$.inputs.input_text", "lookup_child_spans": false, "name": "input_text", "span_name": "LangGraph.workflow", "type": "input" }, { "attribute_name": "traceloop.entity.output", "column_name": null, "json_path": "$.outputs.generated_text", "lookup_child_spans": false, "name": "generated_text", "span_name": "LangGraph.workflow", "type": "output" } ], "source": "trace" }, "title": "Mapping" } }, "$defs": { "GenAIMetric": { "description": "Defines the Generative AI metric interface", "properties": { "name": { "description": "The name of the metric.", "examples": [ "answer_relevance", "context_relevance" ], "title": "Metric Name", "type": "string" }, "display_name": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "The display name of the metric.", "examples": [ "Answer Relevance", "Context Relevance" ], "title": "Metric display name" }, "type_": { "default": "ootb", "description": "The type of the metric. Indicates whether the metric is ootb or custom.", "examples": [ "ootb", "custom" ], "title": "Metric type", "type": "string" }, "value_type": { "default": "numeric", "description": "The type of the metric value. Indicates whether the metric value is numeric or categorical.", "examples": [ "numeric", "categorical" ], "title": "Metric value type", "type": "string" }, "thresholds": { "default": [], "description": "The list of thresholds", "items": { "$ref": "#/$defs/MetricThreshold" }, "title": "Thresholds", "type": "array" }, "tasks": { "default": [], "description": "The task types this metric is associated with.", "items": { "$ref": "#/$defs/TaskType" }, "title": "Tasks", "type": "array" }, "group": { "anyOf": [ { "$ref": "#/$defs/MetricGroup" }, { "type": "null" } ], "default": null, "description": "The metric group this metric belongs to." }, "is_reference_free": { "default": true, "description": "Decides whether this metric needs a reference for computation", "title": "Is Reference Free", "type": "boolean" }, "method": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "The method used to compute the metric.", "title": "Method" }, "metric_dependencies": { "default": [], "description": "Metrics that needs to be evaluated first", "items": { "$ref": "#/$defs/GenAIMetric" }, "title": "Metric Dependencies", "type": "array" }, "applies_to": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": "message", "description": "The tag to indicate for which the metric is applied to. Used for agentic application metric computation.", "examples": [ "message", "conversation", "sub_agent" ], "title": "Applies to" }, "mapping": { "anyOf": [ { "$ref": "#/$defs/Mapping" }, { "type": "null" } ], "default": null, "description": "The data mapping details for the metric which are used to read the values needed to compute the metric.", "examples": { "items": [ { "attribute_name": "traceloop.entity.input", "column_name": null, "json_path": "$.inputs.input_text", "lookup_child_spans": false, "name": "input_text", "span_name": "LangGraph.workflow", "type": "input" }, { "attribute_name": "traceloop.entity.output", "column_name": null, "json_path": "$.outputs.generated_text", "lookup_child_spans": false, "name": "generated_text", "span_name": "LangGraph.workflow", "type": "output" } ], "source": "trace" }, "title": "Mapping" } }, "required": [ "name" ], "title": "GenAIMetric", "type": "object" }, "Mapping": { "description": "Defines the field mapping details to be used for computing a metric.", "properties": { "source": { "default": "trace", "description": "The source type of the data. Use trace if the data should be read from span in trace. Use tabular if the data is passed as a dataframe.", "enum": [ "trace", "tabular" ], "examples": [ "trace", "tabular" ], "title": "Source", "type": "string" }, "items": { "description": "The list of mapping items for the field. They are used to read the data from trace or tabular data for computing the metric.", "items": { "$ref": "#/$defs/MappingItem" }, "title": "Mapping Items", "type": "array" } }, "required": [ "items" ], "title": "Mapping", "type": "object" }, "MappingItem": { "description": "The mapping details to be used for reading the values from the data.", "properties": { "name": { "description": "The name of the item.", "examples": [ "input_text", "generated_text", "context", "ground_truth" ], "title": "Name", "type": "string" }, "type": { "description": "The type of the item.", "enum": [ "input", "output", "reference", "context", "tool_call" ], "examples": [ "input" ], "title": "Type", "type": "string" }, "column_name": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "The column name in the tabular data to be used for reading the field value. Applicable for tabular source.", "title": "Column Name" }, "span_name": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "The span name in the trace data to be used for reading the field value. Applicable for trace source.", "title": "Span Name" }, "attribute_name": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "The attribute name in the trace to be used for reading the field value. Applicable for trace source.", "title": "Attribute Name" }, "json_path": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "The json path to be used for reading the field value from the attribute value. Applicable for trace source. If not provided, the span attribute value is read as the field value.", "title": "Json Path" }, "lookup_child_spans": { "anyOf": [ { "type": "boolean" }, { "type": "null" } ], "default": false, "description": "The flag to indicate if all the child spans should be searched for the attribute value. Applicable for trace source.", "title": "Look up child spans" } }, "required": [ "name", "type" ], "title": "MappingItem", "type": "object" }, "MetricGroup": { "enum": [ "retrieval_quality", "answer_quality", "content_safety", "performance", "usage", "message_completion", "tool_call_quality", "readability", "custom" ], "title": "MetricGroup", "type": "string" }, "MetricThreshold": { "description": "The class that defines the threshold for a metric.", "properties": { "type": { "description": "Threshold type. One of 'lower_limit', 'upper_limit'", "enum": [ "lower_limit", "upper_limit" ], "title": "Type", "type": "string" }, "value": { "default": 0, "description": "The value of metric threshold", "title": "Threshold value", "type": "number" } }, "required": [ "type" ], "title": "MetricThreshold", "type": "object" }, "TaskType": { "description": "Supported task types for generative AI models", "enum": [ "question_answering", "classification", "summarization", "generation", "extraction", "retrieval_augmented_generation" ], "title": "TaskType", "type": "string" } } }
- Fields:
group (Annotated[ibm_watsonx_gov.entities.enums.MetricGroup, FieldInfo(annotation=NoneType, required=False, default=
metric_dependencies (Annotated[list[ibm_watsonx_gov.entities.metric.GenAIMetric], FieldInfo(annotation=NoneType, required=False, default=[ContextRelevanceMetric(name='context_relevance', display_name='Context Relevance', type_='ootb', value_type='numeric', thresholds=[MetricThreshold(type='lower_limit', value=0.7)], tasks=[
tasks (Annotated[list[ibm_watsonx_gov.entities.enums.TaskType], FieldInfo(annotation=NoneType, required=False, default=[
- Validators:
- field display_name: Annotated[Literal['Retrieval Precision'], FieldInfo(annotation=NoneType, required=False, default='Retrieval Precision', title='Display Name', description='The retrieval precision metric display name.', frozen=True)] = 'Retrieval Precision'¶
The retrieval precision metric display name.
- Validated by:
- field group: ', frozen=True)] = MetricGroup.RETRIEVAL_QUALITY¶
The metric group.
- Validated by:
- field metric_dependencies: RETRIEVAL_QUALITY: 'retrieval_quality'>, is_reference_free=True, method='token_precision', metric_dependencies=[], applies_to='message', mapping=None, llm_judge=None, compute_per_context=False, id='context_relevance_token_precision')], title='Metric dependencies', description='The list of metric dependencies')] = [ContextRelevanceMetric(name='context_relevance', display_name='Context Relevance', type_='ootb', value_type='numeric', thresholds=[MetricThreshold(type='lower_limit', value=0.7)], tasks=[<TaskType.RAG: 'retrieval_augmented_generation'>], group=<MetricGroup.RETRIEVAL_QUALITY: 'retrieval_quality'>, is_reference_free=True, method='token_precision', metric_dependencies=[], applies_to='message', mapping=None, llm_judge=None, compute_per_context=False, id='context_relevance_token_precision')]¶
The list of metric dependencies
- Validated by:
- field name: Annotated[Literal['retrieval_precision'], FieldInfo(annotation=NoneType, required=False, default='retrieval_precision', title='Name', description='The retrieval precision metric name.', frozen=True)] = 'retrieval_precision'¶
The retrieval precision metric name.
- Validated by:
- field thresholds: Annotated[list[MetricThreshold], FieldInfo(annotation=NoneType, required=False, default=[MetricThreshold(type='lower_limit', value=0.7)], title='Thresholds', description='The metric thresholds.')] = [MetricThreshold(type='lower_limit', value=0.7)]¶
The metric thresholds.
- Validated by:
- evaluate(data: DataFrame, configuration: GenAIConfiguration | AgenticAIConfiguration, metrics_result: list[AggregateMetricResult], **kwargs) AggregateMetricResult ¶
- validator metric_dependencies_validator » metric_dependencies¶
- model_post_init(context: Any, /) None ¶
We need to both initialize private attributes and call the user-defined model_post_init method.
- pydantic model ibm_watsonx_gov.metrics.retrieval_precision.retrieval_precision_metric.RetrievalPrecisionResult¶
Bases:
RecordMetricResult
Show JSON schema
{ "title": "RetrievalPrecisionResult", "type": "object", "properties": { "name": { "default": "retrieval_precision", "title": "Name", "type": "string" }, "display_name": { "default": "Retrieval Precision", "title": "Display Name", "type": "string" }, "value_type": { "default": "numeric", "description": "The type of the metric value. Indicates whether the metric value is numeric or categorical.", "examples": [ "numeric", "categorical" ], "title": "Metric value type", "type": "string" }, "method": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "The method used to compute this metric result.", "examples": [ "token_recall" ], "title": "Method" }, "provider": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "The provider used to compute this metric result.", "title": "Provider" }, "value": { "anyOf": [ { "type": "number" }, { "type": "string" }, { "type": "boolean" }, { "additionalProperties": { "type": "integer" }, "type": "object" }, { "type": "null" } ], "description": "The metric value.", "title": "Value" }, "label": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "The string equivalent of the metric value. This is used for metrics with categorical value type.", "title": "Label" }, "errors": { "anyOf": [ { "items": { "$ref": "#/$defs/Error" }, "type": "array" }, { "type": "null" } ], "default": null, "description": "The list of error messages", "title": "Errors" }, "additional_info": { "anyOf": [ { "type": "object" }, { "type": "null" } ], "default": null, "description": "The additional information about the metric result.", "title": "Additional Info" }, "explanation": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "The explanation about the metric result.", "title": "Explanation" }, "group": { "$ref": "#/$defs/MetricGroup", "default": "retrieval_quality" }, "thresholds": { "default": [], "description": "The metric thresholds", "items": { "$ref": "#/$defs/MetricThreshold" }, "title": "Thresholds", "type": "array" }, "record_id": { "description": "The record identifier.", "examples": [ "record1" ], "title": "Record Id", "type": "string" }, "record_timestamp": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "The record timestamp.", "examples": [ "2025-01-01T00:00:00.000000Z" ], "title": "Record Timestamp" } }, "$defs": { "Error": { "properties": { "code": { "description": "The error code", "title": "Code", "type": "string" }, "message_en": { "description": "The error message in English.", "title": "Message En", "type": "string" }, "parameters": { "default": [], "description": "The list of parameters to construct the message in a different locale.", "items": {}, "title": "Parameters", "type": "array" } }, "required": [ "code", "message_en" ], "title": "Error", "type": "object" }, "MetricGroup": { "enum": [ "retrieval_quality", "answer_quality", "content_safety", "performance", "usage", "message_completion", "tool_call_quality", "readability", "custom" ], "title": "MetricGroup", "type": "string" }, "MetricThreshold": { "description": "The class that defines the threshold for a metric.", "properties": { "type": { "description": "Threshold type. One of 'lower_limit', 'upper_limit'", "enum": [ "lower_limit", "upper_limit" ], "title": "Type", "type": "string" }, "value": { "default": 0, "description": "The value of metric threshold", "title": "Threshold value", "type": "number" } }, "required": [ "type" ], "title": "MetricThreshold", "type": "object" } }, "required": [ "value", "record_id" ] }
- Config:
arbitrary_types_allowed: bool = True
use_enum_values: bool = True
- Fields:
- field display_name: str = 'Retrieval Precision'¶
- field group: MetricGroup = MetricGroup.RETRIEVAL_QUALITY¶
- field name: str = 'retrieval_precision'¶