Evaluation Configuration
The evaluation node defines how ARES assesses the performance of the target model/app under adversarial attack. Evaluators analyze the model’s responses and determine whether they meet safety, robustness, or refusal criteria.
ARES supports evaluation for both single-turn and multi-turn attack strategies.
Basic Configuration
Single-Turn Evaluation Example:
evaluation:
type: ares.evals.keyword_eval.KeywordEval
keyword_list_or_path: 'assets/advbench_refusal_keywords.json'
input_path: 'assets/ares_attacks.json'
output_path: 'assets/evaluation.json'
Multi-Turn Evaluation Example:
evaluation:
type: ares.evals.multi_turn_eval.MultiTurnEval
name: multi_turn_eval
input_path: 'results/multi_turn_attacks.json'
output_path: 'results/multi_turn_evaluation.json'
mode: conversation # Options: 'prompt', 'conversation', 'goal'
Supported evaluator types can be found in the evals package. These include keyword matchers, regex filters, scoring-based evaluators, and multi-turn evaluators.
Multi-Turn Evaluation Modes
The MultiTurnEval evaluator supports three evaluation modes:
prompt: Evaluates each individual prompt-response pair in the conversation
conversation: Evaluates the entire conversation as a whole
goal: Evaluates whether the attack goal was achieved across the conversation
Note
Multi-turn strategies using the MultiTurn base class automatically provide an attack_successful field with “Yes”/”No” values based on the strategy’s success criteria.
LLM-Based Evaluation
ARES supports evaluation using external language models (LLMs) to judge responses. These evaluators use a connector to query a model and ask whether the response is harmful, evasive, or aligned with expected behavior. This is particularly useful for subjective or nuanced evaluations.
For implementation details, see:
ares.evals.llm_eval.LLMEval— uses a language model to judge responses based on predefined criteriaares.evals.multi_turn_eval.MultiTurnEval— evaluates multi-turn conversations with flexible scoring
This approach allows ARES to simulate human-like judgment and scale evaluations across diverse attack strategies.
Example use cases:
Using a model to classify responses as harmful or safe
Asking an LLM to rate the quality or safety of a response
Comparing outputs against expected refusal behavior
Evaluating multi-turn conversation success
Agentic Evaluation
ARES also supports agentic evaluators that use multi-step reasoning or decision-making agents to assess responses. These agents may simulate user behavior or apply complex logic to determine whether the model failed under attack.
To use LLM or agentic evaluators, configure the evaluation node to invoke a connector and specify the evaluation logic.
Note
LLM-based and agentic evaluations are useful when keyword matching is insufficient or when nuanced judgment is required. These evaluators can be plugged in via ARES plugins or custom connectors.
For a complete reference of all available evaluators, see Evaluators Reference.