Evaluators Reference
====================

This section provides a comprehensive reference for all available evaluators in ARES. Evaluators analyze model responses to determine whether they meet safety, robustness, or refusal criteria.

Overview
--------

ARES supports multiple types of evaluators:

- **Keyword-based**: Pattern matching against refusal keywords
- **LLM-based**: Model-as-a-judge evaluation using language models
- **Detector-based**: Specialized detectors (e.g., Garak encoding detectors)
- **OWASP-specific**: Evaluators tailored for specific OWASP vulnerabilities

Keyword Evaluator
-----------------

**Type:** ``ares.evals.keyword_eval.KeywordEval``

Evaluates responses by checking for the presence of refusal keywords or patterns.

**Configuration:**

.. code-block:: yaml

   evaluation:
     keyword:
       type: ares.evals.keyword_eval.KeywordEval
       name: keyword
       keyword_list_or_path: assets/advbench_refusal_keywords.json
       output_path: results/keyword_evaluation.json

**Use Cases:**
- Quick refusal detection
- Low-resource evaluation
- Baseline safety checks

LLM-Based Evaluators
--------------------

HarmBench Evaluator
^^^^^^^^^^^^^^^^^^^

**Type:** ``ares.evals.llm_eval.LLMEval``

Uses a language model to judge whether responses are harmful based on HarmBench criteria.

**Configuration:**

.. code-block:: yaml

   evaluation:
     harmbench_eval:
       type: ares.evals.llm_eval.LLMEval
       name: harmbench_eval
       output_path: results/harmbench_evaluation.json
       connector:
         harmbench-eval-llama:
           prompt_path: assets/harmbench_prompt.yaml

**Features:**
- Model-as-a-judge approach
- Nuanced safety assessment
- Configurable evaluation prompts

HuggingFace Evaluator
^^^^^^^^^^^^^^^^^^^^^

**Type:** ``ares.evals.huggingface_eval.HuggingFaceEval``

Similar to LLM evaluator but optimized for HuggingFace models.

**Configuration:**

.. code-block:: yaml

   evaluation:
     huggingface_eval:
       type: ares.evals.huggingface_eval.HuggingFaceEval
       name: harmbench_hf_eval
       output_path: results/harmbench_hf_evaluation.json
       connector:
         harmbench-eval-llama:
           prompt_path: assets/harmbench_prompt.yaml

GPT Evaluator
^^^^^^^^^^^^^

**Type:** ``ares.evals.llm_eval.LLMEval``

Uses GPT-OSS-120B on WatsonX for evaluation.

**Configuration:**

.. code-block:: yaml

   evaluation:
     gpt_eval:
       type: ares.evals.llm_eval.LLMEval
       name: gpt_eval
       output_path: results/gpt_evaluation.json
       connector:
         litellm_connector:
           name: gpt-oss-120b-on-watsonx
           endpoint-type: watsonx
           model: openai/gpt-oss-120b
           reasoning_effort: low
           prompt_path: assets/gptoss_prompt.yaml

Garak Detectors
---------------

Garak detectors are specialized evaluators from the NVIDIA Garak framework for detecting encoded attacks.

DecodeMatch Detector
^^^^^^^^^^^^^^^^^^^^

**Type:** ``ares_garak.evals.encoding.Detector``

Performs exact match detection on decoded content.

**Configuration:**

.. code-block:: yaml

   evaluation:
     garak_decoding_match:
       type: ares_garak.evals.encoding.Detector
       name: decode_match
       detector: detectors.encoding.DecodeMatch
       output_path: results/garak_detector_match_evaluation.json

DecodeApprox Detector
^^^^^^^^^^^^^^^^^^^^^

**Type:** ``ares_garak.evals.encoding.Detector``

Performs approximate match detection on decoded content.

**Configuration:**

.. code-block:: yaml

   evaluation:
     garak_decoding_approx:
       type: ares_garak.evals.encoding.Detector
       name: decode_approx
       detector: detectors.encoding.DecodeApprox
       output_path: results/garak_detector_approx_evaluation.json

**Use Cases:**
- Evaluating encoding attack success
- Detecting obfuscated harmful content
- Testing decoder robustness

Crescendo Evaluator
-------------------

**Type:** ``ares_pyrit.evals.crescendo_eval.CrescendoNativeEval``

Specialized evaluator for Crescendo multi-turn attacks.

**Configuration:**

.. code-block:: yaml

   evaluation:
     crescendo_eval:
       type: ares_pyrit.evals.crescendo_eval.CrescendoNativeEval
       name: crescendo_eval
       output_path: results/crescendo_evaluation.json
       connector:
         litellm_connector:
           name: gpt-oss-120b-on-watsonx
           endpoint-type: watsonx
           model: openai/gpt-oss-120b
           reasoning_effort: low
           prompt_path: assets/gptoss_prompt.yaml

**Features:**
- Multi-turn conversation analysis
- Progressive jailbreak detection
- Context-aware evaluation

OWASP-Specific Evaluators
--------------------------

ARES provides specialized evaluators for each OWASP LLM vulnerability category.

LLM01 Evaluator (Prompt Injection)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

**Type:** ``ares.evals.huggingface_eval.HuggingFaceEval``

**Configuration:**

.. code-block:: yaml

   evaluation:
     llm01_eval:
       type: ares.evals.huggingface_eval.HuggingFaceEval
       name: llm01_harmbench_eval
       output_path: results/owasp_llm_01_evaluation.json
       connector:
         harmbench-eval-llama:
           prompt_path: assets/harmbench_prompt.yaml

LLM02 Evaluator (Sensitive Information Disclosure)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

**Type:** ``ares.evals.llm_eval.LLMEval``

**Configuration:**

.. code-block:: yaml

   evaluation:
     llm02_eval:
       type: ares.evals.llm_eval.LLMEval
       name: llm_eval
       output_path: results/owasp_llm_02_evaluation.json
       connector:
         harmbench-eval-llama:
           prompt_path: assets/eval-llm-02-2025.yaml

LLM04 Evaluator (Data and Model Poisoning)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

**Type:** ``ares.evals.llm_eval.LLMEval``

**Configuration:**

.. code-block:: yaml

   evaluation:
     llm04_eval:
       type: ares.evals.llm_eval.LLMEval
       name: llm_eval
       output_path: results/owasp_llm_04_evaluation.json
       connector:
         harmbench-eval-llama:
           prompt_path: assets/eval-llm-04-2025.yaml

LLM05 Evaluator (Improper Output Handling)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

**Type:** ``ares.evals.llm_eval.LLMEval``

**Configuration:**

.. code-block:: yaml

   evaluation:
     llm05_eval:
       type: ares.evals.llm_eval.LLMEval
       name: llm_eval
       output_path: results/owasp_llm_05_evaluation.json
       connector:
         harmbench-eval-llama:
           prompt_path: assets/eval-llm-05-2025.yaml

LLM06 Evaluator (Excessive Agency)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

**Type:** ``ares.evals.llm_eval.LLMEval``

**Configuration:**

.. code-block:: yaml

   evaluation:
     llm06_eval:
       type: ares.evals.llm_eval.LLMEval
       name: llm_eval
       output_path: results/owasp_llm_06_evaluation.json
       connector:
         harmbench-eval-llama:
           prompt_path: assets/eval-llm-06-2025.yaml

LLM07 Evaluator (System Prompt Leakage)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

**Type:** ``ares.evals.llm_eval.LLMEval``

**Configuration:**

.. code-block:: yaml

   evaluation:
     llm07_eval:
       type: ares.evals.llm_eval.LLMEval
       name: llm_eval
       output_path: results/owasp_llm_07_evaluation.json
       connector:
         harmbench-eval-llama:
           prompt_path: assets/eval-llm-07-2025.yaml

LLM09 Evaluator (Misinformation)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

**Type:** ``ares.evals.llm_eval.LLMEval``

**Configuration:**

.. code-block:: yaml

   evaluation:
     llm09_eval:
       type: ares.evals.llm_eval.LLMEval
       name: llm_eval
       output_path: results/owasp_llm_09_evaluation.json
       connector:
         harmbench-eval-llama:
           prompt_path: assets/eval-llm-09-2025.yaml

LLM10 Evaluator (Unbounded Consumption)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

**Type:** ``ares.evals.llm_eval.LLMEval``

**Configuration:**

.. code-block:: yaml

   evaluation:
     llm10_eval:
       type: ares.evals.llm_eval.LLMEval
       name: llm_eval
       output_path: results/owasp_llm_10_evaluation.json
       connector:
         harmbench-eval-llama:
           prompt_path: assets/eval-llm-10-2025.yaml

Multiple Evaluators
-------------------

ARES supports running multiple evaluators in a single evaluation:

.. code-block:: yaml

   evaluation:
     - keyword
     - harmbench_eval
     - garak_decoding_match

This allows comprehensive assessment using different evaluation methods.

Custom Evaluators
-----------------

To create a custom evaluator:

1. Extend the base evaluator class from ``ares.evals``
2. Implement the required evaluation logic
3. Register it in your configuration

See the plugin development guide for more details.

Viewing Available Evaluators
-----------------------------

Use the CLI to list all available evaluators:

.. code-block:: bash

   ares show evals

To view a specific evaluator's template:

.. code-block:: bash

   ares show evals -n keyword
   ares show evals -n harmbench_eval