Evaluators Reference ==================== This section provides a comprehensive reference for all available evaluators in ARES. Evaluators analyze model responses to determine whether they meet safety, robustness, or refusal criteria. Overview -------- ARES supports multiple types of evaluators: - **Keyword-based**: Pattern matching against refusal keywords - **LLM-based**: Model-as-a-judge evaluation using language models - **Detector-based**: Specialized detectors (e.g., Garak encoding detectors) - **OWASP-specific**: Evaluators tailored for specific OWASP vulnerabilities Keyword Evaluator ----------------- **Type:** ``ares.evals.keyword_eval.KeywordEval`` Evaluates responses by checking for the presence of refusal keywords or patterns. **Configuration:** .. code-block:: yaml evaluation: keyword: type: ares.evals.keyword_eval.KeywordEval name: keyword keyword_list_or_path: assets/advbench_refusal_keywords.json output_path: results/keyword_evaluation.json **Use Cases:** - Quick refusal detection - Low-resource evaluation - Baseline safety checks LLM-Based Evaluators -------------------- HarmBench Evaluator ^^^^^^^^^^^^^^^^^^^ **Type:** ``ares.evals.llm_eval.LLMEval`` Uses a language model to judge whether responses are harmful based on HarmBench criteria. **Configuration:** .. code-block:: yaml evaluation: harmbench_eval: type: ares.evals.llm_eval.LLMEval name: harmbench_eval output_path: results/harmbench_evaluation.json connector: harmbench-eval-llama: prompt_path: assets/harmbench_prompt.yaml **Features:** - Model-as-a-judge approach - Nuanced safety assessment - Configurable evaluation prompts HuggingFace Evaluator ^^^^^^^^^^^^^^^^^^^^^ **Type:** ``ares.evals.huggingface_eval.HuggingFaceEval`` Similar to LLM evaluator but optimized for HuggingFace models. **Configuration:** .. code-block:: yaml evaluation: huggingface_eval: type: ares.evals.huggingface_eval.HuggingFaceEval name: harmbench_hf_eval output_path: results/harmbench_hf_evaluation.json connector: harmbench-eval-llama: prompt_path: assets/harmbench_prompt.yaml GPT Evaluator ^^^^^^^^^^^^^ **Type:** ``ares.evals.llm_eval.LLMEval`` Uses GPT-OSS-120B on WatsonX for evaluation. **Configuration:** .. code-block:: yaml evaluation: gpt_eval: type: ares.evals.llm_eval.LLMEval name: gpt_eval output_path: results/gpt_evaluation.json connector: litellm_connector: name: gpt-oss-120b-on-watsonx endpoint-type: watsonx model: openai/gpt-oss-120b reasoning_effort: low prompt_path: assets/gptoss_prompt.yaml Garak Detectors --------------- Garak detectors are specialized evaluators from the NVIDIA Garak framework for detecting encoded attacks. DecodeMatch Detector ^^^^^^^^^^^^^^^^^^^^ **Type:** ``ares_garak.evals.encoding.Detector`` Performs exact match detection on decoded content. **Configuration:** .. code-block:: yaml evaluation: garak_decoding_match: type: ares_garak.evals.encoding.Detector name: decode_match detector: detectors.encoding.DecodeMatch output_path: results/garak_detector_match_evaluation.json DecodeApprox Detector ^^^^^^^^^^^^^^^^^^^^^ **Type:** ``ares_garak.evals.encoding.Detector`` Performs approximate match detection on decoded content. **Configuration:** .. code-block:: yaml evaluation: garak_decoding_approx: type: ares_garak.evals.encoding.Detector name: decode_approx detector: detectors.encoding.DecodeApprox output_path: results/garak_detector_approx_evaluation.json **Use Cases:** - Evaluating encoding attack success - Detecting obfuscated harmful content - Testing decoder robustness Crescendo Evaluator ------------------- **Type:** ``ares_pyrit.evals.crescendo_eval.CrescendoNativeEval`` Specialized evaluator for Crescendo multi-turn attacks. **Configuration:** .. code-block:: yaml evaluation: crescendo_eval: type: ares_pyrit.evals.crescendo_eval.CrescendoNativeEval name: crescendo_eval output_path: results/crescendo_evaluation.json connector: litellm_connector: name: gpt-oss-120b-on-watsonx endpoint-type: watsonx model: openai/gpt-oss-120b reasoning_effort: low prompt_path: assets/gptoss_prompt.yaml **Features:** - Multi-turn conversation analysis - Progressive jailbreak detection - Context-aware evaluation OWASP-Specific Evaluators -------------------------- ARES provides specialized evaluators for each OWASP LLM vulnerability category. LLM01 Evaluator (Prompt Injection) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ **Type:** ``ares.evals.huggingface_eval.HuggingFaceEval`` **Configuration:** .. code-block:: yaml evaluation: llm01_eval: type: ares.evals.huggingface_eval.HuggingFaceEval name: llm01_harmbench_eval output_path: results/owasp_llm_01_evaluation.json connector: harmbench-eval-llama: prompt_path: assets/harmbench_prompt.yaml LLM02 Evaluator (Sensitive Information Disclosure) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ **Type:** ``ares.evals.llm_eval.LLMEval`` **Configuration:** .. code-block:: yaml evaluation: llm02_eval: type: ares.evals.llm_eval.LLMEval name: llm_eval output_path: results/owasp_llm_02_evaluation.json connector: harmbench-eval-llama: prompt_path: assets/eval-llm-02-2025.yaml LLM04 Evaluator (Data and Model Poisoning) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ **Type:** ``ares.evals.llm_eval.LLMEval`` **Configuration:** .. code-block:: yaml evaluation: llm04_eval: type: ares.evals.llm_eval.LLMEval name: llm_eval output_path: results/owasp_llm_04_evaluation.json connector: harmbench-eval-llama: prompt_path: assets/eval-llm-04-2025.yaml LLM05 Evaluator (Improper Output Handling) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ **Type:** ``ares.evals.llm_eval.LLMEval`` **Configuration:** .. code-block:: yaml evaluation: llm05_eval: type: ares.evals.llm_eval.LLMEval name: llm_eval output_path: results/owasp_llm_05_evaluation.json connector: harmbench-eval-llama: prompt_path: assets/eval-llm-05-2025.yaml LLM06 Evaluator (Excessive Agency) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ **Type:** ``ares.evals.llm_eval.LLMEval`` **Configuration:** .. code-block:: yaml evaluation: llm06_eval: type: ares.evals.llm_eval.LLMEval name: llm_eval output_path: results/owasp_llm_06_evaluation.json connector: harmbench-eval-llama: prompt_path: assets/eval-llm-06-2025.yaml LLM07 Evaluator (System Prompt Leakage) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ **Type:** ``ares.evals.llm_eval.LLMEval`` **Configuration:** .. code-block:: yaml evaluation: llm07_eval: type: ares.evals.llm_eval.LLMEval name: llm_eval output_path: results/owasp_llm_07_evaluation.json connector: harmbench-eval-llama: prompt_path: assets/eval-llm-07-2025.yaml LLM09 Evaluator (Misinformation) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ **Type:** ``ares.evals.llm_eval.LLMEval`` **Configuration:** .. code-block:: yaml evaluation: llm09_eval: type: ares.evals.llm_eval.LLMEval name: llm_eval output_path: results/owasp_llm_09_evaluation.json connector: harmbench-eval-llama: prompt_path: assets/eval-llm-09-2025.yaml LLM10 Evaluator (Unbounded Consumption) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ **Type:** ``ares.evals.llm_eval.LLMEval`` **Configuration:** .. code-block:: yaml evaluation: llm10_eval: type: ares.evals.llm_eval.LLMEval name: llm_eval output_path: results/owasp_llm_10_evaluation.json connector: harmbench-eval-llama: prompt_path: assets/eval-llm-10-2025.yaml Multiple Evaluators ------------------- ARES supports running multiple evaluators in a single evaluation: .. code-block:: yaml evaluation: - keyword - harmbench_eval - garak_decoding_match This allows comprehensive assessment using different evaluation methods. Custom Evaluators ----------------- To create a custom evaluator: 1. Extend the base evaluator class from ``ares.evals`` 2. Implement the required evaluation logic 3. Register it in your configuration See the plugin development guide for more details. Viewing Available Evaluators ----------------------------- Use the CLI to list all available evaluators: .. code-block:: bash ares show evals To view a specific evaluator's template: .. code-block:: bash ares show evals -n keyword ares show evals -n harmbench_eval