Evaluators Reference

This section provides a comprehensive reference for all available evaluators in ARES. Evaluators analyze model responses to determine whether they meet safety, robustness, or refusal criteria.

Overview

ARES supports multiple types of evaluators:

Keyword-based: Pattern matching against refusal keywords
LLM-based: Model-as-a-judge evaluation using language models
Detector-based: Specialized detectors (e.g., Garak encoding detectors)
OWASP-specific: Evaluators tailored for specific OWASP vulnerabilities

Keyword Evaluator

Type: ares.evals.keyword_eval.KeywordEval

Evaluates responses by checking for the presence of refusal keywords or patterns.

Configuration:

evaluation:
  keyword:
    type: ares.evals.keyword_eval.KeywordEval
    name: keyword
    keyword_list_or_path: assets/advbench_refusal_keywords.json
    output_path: results/keyword_evaluation.json

Use Cases: - Quick refusal detection - Low-resource evaluation - Baseline safety checks

LLM-Based Evaluators

HarmBench Evaluator

Type: ares.evals.llm_eval.LLMEval

Uses a language model to judge whether responses are harmful based on HarmBench criteria.

Configuration:

evaluation:
  harmbench_eval:
    type: ares.evals.llm_eval.LLMEval
    name: harmbench_eval
    output_path: results/harmbench_evaluation.json
    connector:
      harmbench-eval-llama:
        prompt_path: assets/harmbench_prompt.yaml

Features: - Model-as-a-judge approach - Nuanced safety assessment - Configurable evaluation prompts

HuggingFace Evaluator

Type: ares.evals.huggingface_eval.HuggingFaceEval

Similar to LLM evaluator but optimized for HuggingFace models.

Configuration:

evaluation:
  huggingface_eval:
    type: ares.evals.huggingface_eval.HuggingFaceEval
    name: harmbench_hf_eval
    output_path: results/harmbench_hf_evaluation.json
    connector:
      harmbench-eval-llama:
        prompt_path: assets/harmbench_prompt.yaml

GPT Evaluator

Type: ares.evals.llm_eval.LLMEval

Uses GPT-OSS-120B on WatsonX for evaluation.

Configuration:

evaluation:
  gpt_eval:
    type: ares.evals.llm_eval.LLMEval
    name: gpt_eval
    output_path: results/gpt_evaluation.json
    connector:
      litellm_connector:
        name: gpt-oss-120b-on-watsonx
        endpoint-type: watsonx
        model: openai/gpt-oss-120b
        reasoning_effort: low
        prompt_path: assets/gptoss_prompt.yaml

Garak Detectors

Garak detectors are specialized evaluators from the NVIDIA Garak framework for detecting encoded attacks.

DecodeMatch Detector

Type: ares_garak.evals.encoding.Detector

Performs exact match detection on decoded content.

Configuration:

evaluation:
  garak_decoding_match:
    type: ares_garak.evals.encoding.Detector
    name: decode_match
    detector: detectors.encoding.DecodeMatch
    output_path: results/garak_detector_match_evaluation.json

DecodeApprox Detector

Type: ares_garak.evals.encoding.Detector

Performs approximate match detection on decoded content.

Configuration:

evaluation:
  garak_decoding_approx:
    type: ares_garak.evals.encoding.Detector
    name: decode_approx
    detector: detectors.encoding.DecodeApprox
    output_path: results/garak_detector_approx_evaluation.json

Use Cases: - Evaluating encoding attack success - Detecting obfuscated harmful content - Testing decoder robustness

Crescendo Evaluator

Type: ares_pyrit.evals.crescendo_eval.CrescendoNativeEval

Specialized evaluator for Crescendo multi-turn attacks.

Configuration:

evaluation:
  crescendo_eval:
    type: ares_pyrit.evals.crescendo_eval.CrescendoNativeEval
    name: crescendo_eval
    output_path: results/crescendo_evaluation.json
    connector:
      litellm_connector:
        name: gpt-oss-120b-on-watsonx
        endpoint-type: watsonx
        model: openai/gpt-oss-120b
        reasoning_effort: low
        prompt_path: assets/gptoss_prompt.yaml

Features: - Multi-turn conversation analysis - Progressive jailbreak detection - Context-aware evaluation

OWASP-Specific Evaluators

ARES provides specialized evaluators for each OWASP LLM vulnerability category.

LLM01 Evaluator (Prompt Injection)

Type: ares.evals.huggingface_eval.HuggingFaceEval

Configuration:

evaluation:
  llm01_eval:
    type: ares.evals.huggingface_eval.HuggingFaceEval
    name: llm01_harmbench_eval
    output_path: results/owasp_llm_01_evaluation.json
    connector:
      harmbench-eval-llama:
        prompt_path: assets/harmbench_prompt.yaml

LLM02 Evaluator (Sensitive Information Disclosure)

Type: ares.evals.llm_eval.LLMEval

Configuration:

evaluation:
  llm02_eval:
    type: ares.evals.llm_eval.LLMEval
    name: llm_eval
    output_path: results/owasp_llm_02_evaluation.json
    connector:
      harmbench-eval-llama:
        prompt_path: assets/eval-llm-02-2025.yaml

LLM04 Evaluator (Data and Model Poisoning)

Type: ares.evals.llm_eval.LLMEval

Configuration:

evaluation:
  llm04_eval:
    type: ares.evals.llm_eval.LLMEval
    name: llm_eval
    output_path: results/owasp_llm_04_evaluation.json
    connector:
      harmbench-eval-llama:
        prompt_path: assets/eval-llm-04-2025.yaml

LLM05 Evaluator (Improper Output Handling)

Type: ares.evals.llm_eval.LLMEval

Configuration:

evaluation:
  llm05_eval:
    type: ares.evals.llm_eval.LLMEval
    name: llm_eval
    output_path: results/owasp_llm_05_evaluation.json
    connector:
      harmbench-eval-llama:
        prompt_path: assets/eval-llm-05-2025.yaml

LLM06 Evaluator (Excessive Agency)

Type: ares.evals.llm_eval.LLMEval

Configuration:

evaluation:
  llm06_eval:
    type: ares.evals.llm_eval.LLMEval
    name: llm_eval
    output_path: results/owasp_llm_06_evaluation.json
    connector:
      harmbench-eval-llama:
        prompt_path: assets/eval-llm-06-2025.yaml

LLM07 Evaluator (System Prompt Leakage)

Type: ares.evals.llm_eval.LLMEval

Configuration:

evaluation:
  llm07_eval:
    type: ares.evals.llm_eval.LLMEval
    name: llm_eval
    output_path: results/owasp_llm_07_evaluation.json
    connector:
      harmbench-eval-llama:
        prompt_path: assets/eval-llm-07-2025.yaml

LLM09 Evaluator (Misinformation)

Type: ares.evals.llm_eval.LLMEval

Configuration:

evaluation:
  llm09_eval:
    type: ares.evals.llm_eval.LLMEval
    name: llm_eval
    output_path: results/owasp_llm_09_evaluation.json
    connector:
      harmbench-eval-llama:
        prompt_path: assets/eval-llm-09-2025.yaml

LLM10 Evaluator (Unbounded Consumption)

Type: ares.evals.llm_eval.LLMEval

Configuration:

evaluation:
  llm10_eval:
    type: ares.evals.llm_eval.LLMEval
    name: llm_eval
    output_path: results/owasp_llm_10_evaluation.json
    connector:
      harmbench-eval-llama:
        prompt_path: assets/eval-llm-10-2025.yaml

Multiple Evaluators

ARES supports running multiple evaluators in a single evaluation:

evaluation:
  - keyword
  - harmbench_eval
  - garak_decoding_match

This allows comprehensive assessment using different evaluation methods.

Custom Evaluators

To create a custom evaluator:

Extend the base evaluator class from ares.evals
Implement the required evaluation logic
Register it in your configuration

See the plugin development guide for more details.

Viewing Available Evaluators

Use the CLI to list all available evaluators:

ares show evals

To view a specific evaluator’s template:

ares show evals -n keyword
ares show evals -n harmbench_eval