Evaluation Configuration ======================== The ``evaluation`` node defines how ARES assesses the performance of the target model/app under adversarial attack. Evaluators analyze the model's responses and determine whether they meet safety, robustness, or refusal criteria. Basic Configuration ------------------- Here is an example using a keyword-based evaluator: .. code-block:: yaml evaluation: type: ares.evals.keyword_eval.KeywordEval keyword_list_or_path: 'assets/advbench_refusal_keywords.json' # path to refusal keywords input_path: 'assets/ares_attacks.json' # path to generated attack prompts output_path: 'assets/evaluation.json' # path to save evaluation results Supported evaluator types can be found in the ``evals`` package. These include keyword matchers, regex filters, and scoring-based evaluators. LLM-Based Evaluation -------------------- ARES supports evaluation using external language models (LLMs) to judge responses. These evaluators use a connector to query a model and ask whether the response is harmful, evasive, or aligned with expected behavior. This is particularly useful for subjective or nuanced evaluations. For implementation details, see: - ``ares.evals.llm_eval.LLMEval`` — uses a language model to judge responses based on predefined criteria. This approach allows ARES to simulate human-like judgment and scale evaluations across diverse attack strategies. Example use cases: - Using a model to classify responses as harmful or safe - Asking an LLM to rate the quality or safety of a response - Comparing outputs against expected refusal behavior Agentic Evaluation ------------------ ARES also supports agentic evaluators that use multi-step reasoning or decision-making agents to assess responses. These agents may simulate user behavior or apply complex logic to determine whether the model failed under attack. To use LLM or agentic evaluators, configure the ``evaluation`` node to invoke a connector and specify the evaluation logic. .. note:: LLM-based and agentic evaluations are useful when keyword matching is insufficient or when nuanced judgment is required. These evaluators can be plugged in via ARES plugins or custom connectors.