Evaluators Reference

This section provides a comprehensive reference for all available evaluators in ARES. Evaluators analyze model responses to determine whether they meet safety, robustness, or refusal criteria.

Overview

ARES supports multiple types of evaluators:

  • Keyword-based: Pattern matching against refusal keywords

  • LLM-based: Model-as-a-judge evaluation using language models

  • Detector-based: Specialized detectors (e.g., Garak encoding detectors)

  • OWASP-specific: Evaluators tailored for specific OWASP vulnerabilities

Keyword Evaluator

Type: ares.evals.keyword_eval.KeywordEval

Evaluates responses by checking for the presence of refusal keywords or patterns.

Configuration:

evaluation:
  keyword:
    type: ares.evals.keyword_eval.KeywordEval
    name: keyword
    keyword_list_or_path: assets/advbench_refusal_keywords.json
    output_path: results/keyword_evaluation.json

Use Cases: - Quick refusal detection - Low-resource evaluation - Baseline safety checks

LLM-Based Evaluators

HarmBench Evaluator

Type: ares.evals.llm_eval.LLMEval

Uses a language model to judge whether responses are harmful based on HarmBench criteria.

Configuration:

evaluation:
  harmbench_eval:
    type: ares.evals.llm_eval.LLMEval
    name: harmbench_eval
    output_path: results/harmbench_evaluation.json
    connector:
      harmbench-eval-llama:
        prompt_path: assets/harmbench_prompt.yaml

Features: - Model-as-a-judge approach - Nuanced safety assessment - Configurable evaluation prompts

HuggingFace Evaluator

Type: ares.evals.huggingface_eval.HuggingFaceEval

Similar to LLM evaluator but optimized for HuggingFace models.

Configuration:

evaluation:
  huggingface_eval:
    type: ares.evals.huggingface_eval.HuggingFaceEval
    name: harmbench_hf_eval
    output_path: results/harmbench_hf_evaluation.json
    connector:
      harmbench-eval-llama:
        prompt_path: assets/harmbench_prompt.yaml

GPT Evaluator

Type: ares.evals.llm_eval.LLMEval

Uses GPT-OSS-120B on WatsonX for evaluation.

Configuration:

evaluation:
  gpt_eval:
    type: ares.evals.llm_eval.LLMEval
    name: gpt_eval
    output_path: results/gpt_evaluation.json
    connector:
      litellm_connector:
        name: gpt-oss-120b-on-watsonx
        endpoint-type: watsonx
        model: openai/gpt-oss-120b
        reasoning_effort: low
        prompt_path: assets/gptoss_prompt.yaml

Garak Detectors

Garak detectors are specialized evaluators from the NVIDIA Garak framework for detecting encoded attacks.

DecodeMatch Detector

Type: ares_garak.evals.encoding.Detector

Performs exact match detection on decoded content.

Configuration:

evaluation:
  garak_decoding_match:
    type: ares_garak.evals.encoding.Detector
    name: decode_match
    detector: detectors.encoding.DecodeMatch
    output_path: results/garak_detector_match_evaluation.json

DecodeApprox Detector

Type: ares_garak.evals.encoding.Detector

Performs approximate match detection on decoded content.

Configuration:

evaluation:
  garak_decoding_approx:
    type: ares_garak.evals.encoding.Detector
    name: decode_approx
    detector: detectors.encoding.DecodeApprox
    output_path: results/garak_detector_approx_evaluation.json

Use Cases: - Evaluating encoding attack success - Detecting obfuscated harmful content - Testing decoder robustness

Crescendo Evaluator

Type: ares_pyrit.evals.crescendo_eval.CrescendoNativeEval

Specialized evaluator for Crescendo multi-turn attacks.

Configuration:

evaluation:
  crescendo_eval:
    type: ares_pyrit.evals.crescendo_eval.CrescendoNativeEval
    name: crescendo_eval
    output_path: results/crescendo_evaluation.json
    connector:
      litellm_connector:
        name: gpt-oss-120b-on-watsonx
        endpoint-type: watsonx
        model: openai/gpt-oss-120b
        reasoning_effort: low
        prompt_path: assets/gptoss_prompt.yaml

Features: - Multi-turn conversation analysis - Progressive jailbreak detection - Context-aware evaluation

OWASP-Specific Evaluators

ARES provides specialized evaluators for each OWASP LLM vulnerability category.

LLM01 Evaluator (Prompt Injection)

Type: ares.evals.huggingface_eval.HuggingFaceEval

Configuration:

evaluation:
  llm01_eval:
    type: ares.evals.huggingface_eval.HuggingFaceEval
    name: llm01_harmbench_eval
    output_path: results/owasp_llm_01_evaluation.json
    connector:
      harmbench-eval-llama:
        prompt_path: assets/harmbench_prompt.yaml

LLM02 Evaluator (Sensitive Information Disclosure)

Type: ares.evals.llm_eval.LLMEval

Configuration:

evaluation:
  llm02_eval:
    type: ares.evals.llm_eval.LLMEval
    name: llm_eval
    output_path: results/owasp_llm_02_evaluation.json
    connector:
      harmbench-eval-llama:
        prompt_path: assets/eval-llm-02-2025.yaml

LLM04 Evaluator (Data and Model Poisoning)

Type: ares.evals.llm_eval.LLMEval

Configuration:

evaluation:
  llm04_eval:
    type: ares.evals.llm_eval.LLMEval
    name: llm_eval
    output_path: results/owasp_llm_04_evaluation.json
    connector:
      harmbench-eval-llama:
        prompt_path: assets/eval-llm-04-2025.yaml

LLM05 Evaluator (Improper Output Handling)

Type: ares.evals.llm_eval.LLMEval

Configuration:

evaluation:
  llm05_eval:
    type: ares.evals.llm_eval.LLMEval
    name: llm_eval
    output_path: results/owasp_llm_05_evaluation.json
    connector:
      harmbench-eval-llama:
        prompt_path: assets/eval-llm-05-2025.yaml

LLM06 Evaluator (Excessive Agency)

Type: ares.evals.llm_eval.LLMEval

Configuration:

evaluation:
  llm06_eval:
    type: ares.evals.llm_eval.LLMEval
    name: llm_eval
    output_path: results/owasp_llm_06_evaluation.json
    connector:
      harmbench-eval-llama:
        prompt_path: assets/eval-llm-06-2025.yaml

LLM07 Evaluator (System Prompt Leakage)

Type: ares.evals.llm_eval.LLMEval

Configuration:

evaluation:
  llm07_eval:
    type: ares.evals.llm_eval.LLMEval
    name: llm_eval
    output_path: results/owasp_llm_07_evaluation.json
    connector:
      harmbench-eval-llama:
        prompt_path: assets/eval-llm-07-2025.yaml

LLM09 Evaluator (Misinformation)

Type: ares.evals.llm_eval.LLMEval

Configuration:

evaluation:
  llm09_eval:
    type: ares.evals.llm_eval.LLMEval
    name: llm_eval
    output_path: results/owasp_llm_09_evaluation.json
    connector:
      harmbench-eval-llama:
        prompt_path: assets/eval-llm-09-2025.yaml

LLM10 Evaluator (Unbounded Consumption)

Type: ares.evals.llm_eval.LLMEval

Configuration:

evaluation:
  llm10_eval:
    type: ares.evals.llm_eval.LLMEval
    name: llm_eval
    output_path: results/owasp_llm_10_evaluation.json
    connector:
      harmbench-eval-llama:
        prompt_path: assets/eval-llm-10-2025.yaml

Multiple Evaluators

ARES supports running multiple evaluators in a single evaluation:

evaluation:
  - keyword
  - harmbench_eval
  - garak_decoding_match

This allows comprehensive assessment using different evaluation methods.

Custom Evaluators

To create a custom evaluator:

  1. Extend the base evaluator class from ares.evals

  2. Implement the required evaluation logic

  3. Register it in your configuration

See the plugin development guide for more details.

Viewing Available Evaluators

Use the CLI to list all available evaluators:

ares show evals

To view a specific evaluator’s template:

ares show evals -n keyword
ares show evals -n harmbench_eval