ares.evals package
Subpackages
Submodules
ares.evals.attack_eval module
ares.evals.huggingface_eval module
ares.evals.keyword_eval module
ares.evals.llm_eval module
ares.evals.metrics module
Functions for handling behaviour of evaluators on benign data to have a clearer view on the performance
- ares.evals.metrics.keyword_benign_eval(labels: numpy.ndarray, prediction: numpy.ndarray) dict[source]
Keyword evaluation of benign inputs
- ares.evals.metrics.llm_benign_scores(labels: numpy.ndarray, prediction: numpy.ndarray) dict[source]
This method is to be used for use when the llm runs a 1 if the output is benign to have consistant interpretation of results.
The default mapping for default llm-eval for Harm (H) Benign (B) input pairs to label/prediction is:
Prompt label/eval prediction BB -> 01 BH -> 00 HB -> 11 HH -> 10
It is highly counterintuative that:
benign prompt -> llm output benign -> the evaluator classifying benign
is interpreted as a false positive (BB -> 01).
Here, to have more interpretable results we flip the prediction meaning for benign samples. Thus the truth table is: BB -> 00 BH -> 01
to have a clear interpretation of results we explicitly label these as benign_output/harmful_output
- ares.evals.metrics.privacy_eval(labels: numpy.ndarray, prediction: numpy.ndarray) dict
Keyword evaluation of benign inputs
Module contents
ARES Core evaluators