Evaluator API¶

UnitxtEvaluator ¶

Bases: BaseEvaluator

Unitxt wrapper making evaluation of the RAG's usage.

Functions¶

evaluate_metrics ¶

evaluate_metrics(evaluation_data: list[EvaluationData], metrics: Sequence[str]) -> dict

Perform evaluation on the given instances with chosen metric types.

Parameters:

evaluation_data (list[EvaluationData]) –

Instances that hold data needed for the unitxt algorithms to perform evaluation.
metrics (Sequence[str]) –

Values describing which specific evaluation metrics should be used withing evaluation process.

Returns:

dict –

Dictionary of scores given for each EvaluationData.

Source code in ai4rag/evaluator/unitxt_evaluator.py

def evaluate_metrics(
    self,
    evaluation_data: list[EvaluationData],
    metrics: Sequence[str],
) -> dict:
    """
    Perform evaluation on the given instances with chosen metric types.

    Parameters
    ----------
    evaluation_data : list[EvaluationData]
        Instances that hold data needed for the unitxt algorithms to perform evaluation.

    metrics : Sequence[str]
        Values describing which specific evaluation metrics should be used
        withing evaluation process.

    Returns
    -------
    dict
        Dictionary of scores given for each EvaluationData.
    """
    evaluation_primitives = [prim.to_dict() for prim in evaluation_data]
    df = pd.DataFrame(evaluation_primitives)
    unitxt_metrics = self.get_metric_types(metric_types=metrics)
    try:
        scores_df, ci_table = evaluate(df, metric_names=unitxt_metrics, compute_conf_intervals=True)

        returned_ci = self._handle_ci_calculations(ci_table=ci_table)
        question_scores = self._handle_questions_scores(scores_df=scores_df)

        return {"scores": returned_ci, "question_scores": question_scores}

    except Exception as exc:
        raise EvaluationError(exc) from exc

get_metric_types `classmethod` ¶

get_metric_types(metric_types: Sequence[str]) -> list[str]

Perform mapping of general metric names to the specific metric names in the unitxt library.

Parameters:

metric_types (Sequence[str]) –

Metrics defined in the MetricType class.

Returns:

list[str] –

Specific versions of the metrics that can be used within unitxt evaluation process.

Source code in ai4rag/evaluator/unitxt_evaluator.py

@classmethod
def get_metric_types(cls, metric_types: Sequence[str]) -> list[str]:
    """
    Perform mapping of general metric names to the specific metric names
    in the unitxt library.

    Parameters
    ----------
    metric_types : Sequence[str]
        Metrics defined in the MetricType class.

    Returns
    -------
    list[str]
        Specific versions of the metrics that can be used within
        unitxt evaluation process.
    """
    mapping = [cls.METRIC_TYPE_MAP.get(metric, None) for metric in metric_types]
    return [metric for metric in mapping if metric is not None]

decode_unitxt_metric `classmethod` ¶

decode_unitxt_metric(unitxt_metrics: list[str]) -> list[str]

Decode metrics from the unitxt names to general names.

Parameters:

unitxt_metrics (list[str]) –

Encoded unitxt metrics.

Returns:

list[str] –

Corresponding decoded messages

Source code in ai4rag/evaluator/unitxt_evaluator.py

@classmethod
def decode_unitxt_metric(cls, unitxt_metrics: list[str]) -> list[str]:
    """
    Decode metrics from the unitxt names to general names.

    Parameters
    ----------
    unitxt_metrics : list[str]
        Encoded unitxt metrics.

    Returns
    -------
    list[str]
        Corresponding decoded messages
    """

    reversed_mapping = {v: k for k, v in cls.METRIC_TYPE_MAP.items()}
    decoded = [reversed_mapping[metric] for metric in unitxt_metrics]

    return decoded

Evaluator API¶

UnitxtEvaluator ¶

Functions¶

evaluate_metrics ¶

get_metric_types classmethod ¶

decode_unitxt_metric classmethod ¶

get_metric_types `classmethod` ¶

decode_unitxt_metric `classmethod` ¶