Retrieval Precision Metric¶
- pydantic model ibm_watsonx_gov.metrics.retrieval_precision.retrieval_precision_metric.RetrievalPrecisionMetric¶
Bases:
GenAIMetric
Defines the Retrieval Precision metric class.
The Retrieval Precision metric measures the quanity of relevant contexts from the total of contexts that are retrieved. The Context Relevance metric is computed as a pre requisite to compute this metric.
Examples
- Create Retrieval Precision metric with default parameters and compute using metrics evaluator.
metric = RetrievalPrecisionMetric() result = MetricsEvaluator().evaluate(data={"input_text": "...", "context": "..."}, metrics=[metric]) # A list of contexts can also be passed as shown below result = MetricsEvaluator().evaluate(data={"input_text": "...", "context": ["...", "..."]}, metrics=[metric])
- Create Retrieval Precision metric with a custom threshold.
threshold = MetricThreshold(type="lower_limit", value=0.5) metric = RetrievalPrecisionMetric(method=method, threshold=threshold)
- Create Retrieval Precision metric with llm_as_judge method.
# Define LLM Judge using watsonx.ai # To use other frameworks and models as llm_judge, see :module:`ibm_watsonx_gov.entities.foundation_model` llm_judge = LLMJudge(model=WxAIFoundationModel( model_id="google/flan-ul2", project_id="<PROJECT_ID>" )) cr_metric = ContextRelevanceMetric(llm_judge=llm_judge) ap_metric = RetrievalPrecisionMetric() result = MetricsEvaluator().evaluate(data={"input_text": "...", "context": ["...", "..."]}, metrics=[cr_metric, ap_metric])
Show JSON schema
{ "title": "RetrievalPrecisionMetric", "description": "Defines the Retrieval Precision metric class.\n\nThe Retrieval Precision metric measures the quanity of relevant contexts from the total of contexts that are retrieved.\nThe Context Relevance metric is computed as a pre requisite to compute this metric.\n\nExamples:\n 1. Create Retrieval Precision metric with default parameters and compute using metrics evaluator.\n .. code-block:: python\n\n metric = RetrievalPrecisionMetric()\n result = MetricsEvaluator().evaluate(data={\"input_text\": \"...\", \"context\": \"...\"},\n metrics=[metric])\n # A list of contexts can also be passed as shown below\n result = MetricsEvaluator().evaluate(data={\"input_text\": \"...\", \"context\": [\"...\", \"...\"]},\n metrics=[metric])\n\n 2. Create Retrieval Precision metric with a custom threshold.\n .. code-block:: python\n\n threshold = MetricThreshold(type=\"lower_limit\", value=0.5)\n metric = RetrievalPrecisionMetric(method=method, threshold=threshold)\n\n 3. Create Retrieval Precision metric with llm_as_judge method.\n .. code-block:: python\n\n # Define LLM Judge using watsonx.ai\n # To use other frameworks and models as llm_judge, see :module:`ibm_watsonx_gov.entities.foundation_model`\n llm_judge = LLMJudge(model=WxAIFoundationModel(\n model_id=\"google/flan-ul2\",\n project_id=\"<PROJECT_ID>\"\n ))\n cr_metric = ContextRelevanceMetric(llm_judge=llm_judge)\n ap_metric = RetrievalPrecisionMetric()\n result = MetricsEvaluator().evaluate(data={\"input_text\": \"...\", \"context\": [\"...\", \"...\"]},\n metrics=[cr_metric, ap_metric])", "type": "object", "properties": { "name": { "const": "retrieval_precision", "default": "retrieval_precision", "description": "The retrieval precision metric name.", "title": "Name", "type": "string" }, "thresholds": { "default": [ { "type": "lower_limit", "value": 0.7 } ], "description": "The metric thresholds.", "items": { "$ref": "#/$defs/MetricThreshold" }, "title": "Thresholds", "type": "array" }, "tasks": { "default": [ "retrieval_augmented_generation" ], "description": "The list of supported tasks.", "items": { "$ref": "#/$defs/TaskType" }, "title": "Tasks", "type": "array" }, "group": { "$ref": "#/$defs/MetricGroup", "default": "retrieval_quality", "description": "The metric group.", "title": "Group" }, "is_reference_free": { "default": true, "description": "Decides whether this metric needs a reference for computation", "title": "Is Reference Free", "type": "boolean" }, "method": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "The method used to compute the metric.", "title": "Method" }, "metric_dependencies": { "default": [ { "name": "context_relevance", "thresholds": [ { "type": "lower_limit", "value": 0.7 } ], "tasks": [ "retrieval_augmented_generation" ], "group": "retrieval_quality", "is_reference_free": true, "method": "token_precision", "metric_dependencies": [], "llm_judge": null, "compute_per_context": false, "id": "context_relevance_token_precision" } ], "description": "The list of metric dependencies", "items": { "$ref": "#/$defs/GenAIMetric" }, "title": "Metric dependencies", "type": "array" } }, "$defs": { "GenAIMetric": { "description": "Defines the Generative AI metric interface", "properties": { "name": { "description": "The name of the metric", "title": "Metric Name", "type": "string" }, "thresholds": { "default": [], "description": "The list of thresholds", "items": { "$ref": "#/$defs/MetricThreshold" }, "title": "Thresholds", "type": "array" }, "tasks": { "description": "The task types this metric is associated with.", "items": { "$ref": "#/$defs/TaskType" }, "title": "Tasks", "type": "array" }, "group": { "anyOf": [ { "$ref": "#/$defs/MetricGroup" }, { "type": "null" } ], "default": null, "description": "The metric group this metric belongs to." }, "is_reference_free": { "default": true, "description": "Decides whether this metric needs a reference for computation", "title": "Is Reference Free", "type": "boolean" }, "method": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "The method used to compute the metric.", "title": "Method" }, "metric_dependencies": { "default": [], "description": "Metrics that needs to be evaluated first", "items": { "$ref": "#/$defs/GenAIMetric" }, "title": "Metric Dependencies", "type": "array" } }, "required": [ "name", "tasks" ], "title": "GenAIMetric", "type": "object" }, "MetricGroup": { "enum": [ "retrieval_quality", "answer_quality", "content_safety", "performance", "usage", "tool_call_quality", "readability" ], "title": "MetricGroup", "type": "string" }, "MetricThreshold": { "description": "The class that defines the threshold for a metric.", "properties": { "type": { "description": "Threshold type. One of 'lower_limit', 'upper_limit'", "enum": [ "lower_limit", "upper_limit" ], "title": "Type", "type": "string" }, "value": { "default": 0, "description": "The value of metric threshold", "title": "Threshold value", "type": "number" } }, "required": [ "type" ], "title": "MetricThreshold", "type": "object" }, "TaskType": { "description": "Supported task types for generative AI models", "enum": [ "question_answering", "classification", "summarization", "generation", "extraction", "retrieval_augmented_generation" ], "title": "TaskType", "type": "string" } } }
- Fields:
group (Annotated[ibm_watsonx_gov.entities.enums.MetricGroup, FieldInfo(annotation=NoneType, required=False, default=
metric_dependencies (Annotated[list[ibm_watsonx_gov.entities.metric.GenAIMetric], FieldInfo(annotation=NoneType, required=False, default=[ContextRelevanceMetric(name='context_relevance', thresholds=[MetricThreshold(type='lower_limit', value=0.7)], tasks=[
tasks (Annotated[list[ibm_watsonx_gov.entities.enums.TaskType], FieldInfo(annotation=NoneType, required=False, default=[
- Validators:
- field group: ', frozen=True)] = MetricGroup.RETRIEVAL_QUALITY¶
The metric group.
- field metric_dependencies: RETRIEVAL_QUALITY: 'retrieval_quality'>, is_reference_free=True, method='token_precision', metric_dependencies=[], llm_judge=None, compute_per_context=False, id='context_relevance_token_precision')], title='Metric dependencies', description='The list of metric dependencies')] = [ContextRelevanceMetric(name='context_relevance', thresholds=[MetricThreshold(type='lower_limit', value=0.7)], tasks=[<TaskType.RAG: 'retrieval_augmented_generation'>], group=<MetricGroup.RETRIEVAL_QUALITY: 'retrieval_quality'>, is_reference_free=True, method='token_precision', metric_dependencies=[], llm_judge=None, compute_per_context=False, id='context_relevance_token_precision')]¶
The list of metric dependencies
- Validated by:
- field name: Annotated[Literal['retrieval_precision'], FieldInfo(annotation=NoneType, required=False, default='retrieval_precision', title='Name', description='The retrieval precision metric name.', frozen=True)] = 'retrieval_precision'¶
The retrieval precision metric name.
- field tasks: ')] = [TaskType.RAG]¶
The list of supported tasks.
- field thresholds: Annotated[list[MetricThreshold], FieldInfo(annotation=NoneType, required=False, default=[MetricThreshold(type='lower_limit', value=0.7)], title='Thresholds', description='The metric thresholds.')] = [MetricThreshold(type='lower_limit', value=0.7)]¶
The metric thresholds.
- evaluate(data: DataFrame, configuration: GenAIConfiguration | AgenticAIConfiguration, metrics_result: list[AggregateMetricResult], **kwargs) AggregateMetricResult ¶
- validator metric_dependencies_validator » metric_dependencies¶
- model_post_init(context: Any, /) None ¶
We need to both initialize private attributes and call the user-defined model_post_init method.
- pydantic model ibm_watsonx_gov.metrics.retrieval_precision.retrieval_precision_metric.RetrievalPrecisionResult¶
Bases:
RecordMetricResult
Show JSON schema
{ "title": "RetrievalPrecisionResult", "type": "object", "properties": { "name": { "default": "retrieval_precision", "title": "Name", "type": "string" }, "method": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "The method used to compute this metric result.", "examples": [ "token_recall" ], "title": "Method" }, "provider": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "The provider used to compute this metric result.", "title": "Provider" }, "value": { "anyOf": [ { "type": "number" }, { "type": "string" }, { "type": "boolean" }, { "type": "null" } ], "description": "The metric value.", "title": "Value" }, "errors": { "anyOf": [ { "items": { "$ref": "#/$defs/Error" }, "type": "array" }, { "type": "null" } ], "default": null, "description": "The list of error messages", "title": "Errors" }, "additional_info": { "anyOf": [ { "type": "object" }, { "type": "null" } ], "default": null, "description": "The additional information about the metric result.", "title": "Additional Info" }, "group": { "$ref": "#/$defs/MetricGroup", "default": "retrieval_quality" }, "thresholds": { "default": [], "description": "The metric thresholds", "items": { "$ref": "#/$defs/MetricThreshold" }, "title": "Thresholds", "type": "array" }, "record_id": { "description": "The record identifier.", "examples": [ "record1" ], "title": "Record Id", "type": "string" }, "record_timestamp": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "The record timestamp.", "examples": [ "2025-01-01T00:00:00.000000Z" ], "title": "Record Timestamp" } }, "$defs": { "Error": { "properties": { "code": { "description": "The error code", "title": "Code", "type": "string" }, "message_en": { "description": "The error message in English.", "title": "Message En", "type": "string" }, "parameters": { "default": [], "description": "The list of parameters to construct the message in a different locale.", "items": {}, "title": "Parameters", "type": "array" } }, "required": [ "code", "message_en" ], "title": "Error", "type": "object" }, "MetricGroup": { "enum": [ "retrieval_quality", "answer_quality", "content_safety", "performance", "usage", "tool_call_quality", "readability" ], "title": "MetricGroup", "type": "string" }, "MetricThreshold": { "description": "The class that defines the threshold for a metric.", "properties": { "type": { "description": "Threshold type. One of 'lower_limit', 'upper_limit'", "enum": [ "lower_limit", "upper_limit" ], "title": "Type", "type": "string" }, "value": { "default": 0, "description": "The value of metric threshold", "title": "Threshold value", "type": "number" } }, "required": [ "type" ], "title": "MetricThreshold", "type": "object" } }, "required": [ "value", "record_id" ] }
- Config:
arbitrary_types_allowed: bool = True
use_enum_values: bool = True
- Fields:
- field group: MetricGroup = MetricGroup.RETRIEVAL_QUALITY¶
- field name: str = 'retrieval_precision'¶