Evaluators Reference
This section provides a comprehensive reference for all available evaluators in ARES. Evaluators analyze model responses to determine whether they meet safety, robustness, or refusal criteria.
Overview
ARES supports multiple types of evaluators:
Keyword-based: Pattern matching against refusal keywords
LLM-based: Model-as-a-judge evaluation using language models
Detector-based: Specialized detectors (e.g., Garak encoding detectors)
OWASP-specific: Evaluators tailored for specific OWASP vulnerabilities
Keyword Evaluator
Type: ares.evals.keyword_eval.KeywordEval
Evaluates responses by checking for the presence of refusal keywords or patterns.
Configuration:
evaluation:
keyword:
type: ares.evals.keyword_eval.KeywordEval
name: keyword
keyword_list_or_path: assets/advbench_refusal_keywords.json
output_path: results/keyword_evaluation.json
Use Cases: - Quick refusal detection - Low-resource evaluation - Baseline safety checks
LLM-Based Evaluators
HarmBench Evaluator
Type: ares.evals.llm_eval.LLMEval
Uses a language model to judge whether responses are harmful based on HarmBench criteria.
Configuration:
evaluation:
harmbench_eval:
type: ares.evals.llm_eval.LLMEval
name: harmbench_eval
output_path: results/harmbench_evaluation.json
connector:
harmbench-eval-llama:
prompt_path: assets/harmbench_prompt.yaml
Features: - Model-as-a-judge approach - Nuanced safety assessment - Configurable evaluation prompts
HuggingFace Evaluator
Type: ares.evals.huggingface_eval.HuggingFaceEval
Similar to LLM evaluator but optimized for HuggingFace models.
Configuration:
evaluation:
huggingface_eval:
type: ares.evals.huggingface_eval.HuggingFaceEval
name: harmbench_hf_eval
output_path: results/harmbench_hf_evaluation.json
connector:
harmbench-eval-llama:
prompt_path: assets/harmbench_prompt.yaml
GPT Evaluator
Type: ares.evals.llm_eval.LLMEval
Uses GPT-OSS-120B on WatsonX for evaluation.
Configuration:
evaluation:
gpt_eval:
type: ares.evals.llm_eval.LLMEval
name: gpt_eval
output_path: results/gpt_evaluation.json
connector:
litellm_connector:
name: gpt-oss-120b-on-watsonx
endpoint-type: watsonx
model: openai/gpt-oss-120b
reasoning_effort: low
prompt_path: assets/gptoss_prompt.yaml
Garak Detectors
Garak detectors are specialized evaluators from the NVIDIA Garak framework for detecting encoded attacks.
DecodeMatch Detector
Type: ares_garak.evals.encoding.Detector
Performs exact match detection on decoded content.
Configuration:
evaluation:
garak_decoding_match:
type: ares_garak.evals.encoding.Detector
name: decode_match
detector: detectors.encoding.DecodeMatch
output_path: results/garak_detector_match_evaluation.json
DecodeApprox Detector
Type: ares_garak.evals.encoding.Detector
Performs approximate match detection on decoded content.
Configuration:
evaluation:
garak_decoding_approx:
type: ares_garak.evals.encoding.Detector
name: decode_approx
detector: detectors.encoding.DecodeApprox
output_path: results/garak_detector_approx_evaluation.json
Use Cases: - Evaluating encoding attack success - Detecting obfuscated harmful content - Testing decoder robustness
Crescendo Evaluator
Type: ares_pyrit.evals.crescendo_eval.CrescendoNativeEval
Specialized evaluator for Crescendo multi-turn attacks.
Configuration:
evaluation:
crescendo_eval:
type: ares_pyrit.evals.crescendo_eval.CrescendoNativeEval
name: crescendo_eval
output_path: results/crescendo_evaluation.json
connector:
litellm_connector:
name: gpt-oss-120b-on-watsonx
endpoint-type: watsonx
model: openai/gpt-oss-120b
reasoning_effort: low
prompt_path: assets/gptoss_prompt.yaml
Features: - Multi-turn conversation analysis - Progressive jailbreak detection - Context-aware evaluation
OWASP-Specific Evaluators
ARES provides specialized evaluators for each OWASP LLM vulnerability category.
LLM01 Evaluator (Prompt Injection)
Type: ares.evals.huggingface_eval.HuggingFaceEval
Configuration:
evaluation:
llm01_eval:
type: ares.evals.huggingface_eval.HuggingFaceEval
name: llm01_harmbench_eval
output_path: results/owasp_llm_01_evaluation.json
connector:
harmbench-eval-llama:
prompt_path: assets/harmbench_prompt.yaml
LLM02 Evaluator (Sensitive Information Disclosure)
Type: ares.evals.llm_eval.LLMEval
Configuration:
evaluation:
llm02_eval:
type: ares.evals.llm_eval.LLMEval
name: llm_eval
output_path: results/owasp_llm_02_evaluation.json
connector:
harmbench-eval-llama:
prompt_path: assets/eval-llm-02-2025.yaml
LLM04 Evaluator (Data and Model Poisoning)
Type: ares.evals.llm_eval.LLMEval
Configuration:
evaluation:
llm04_eval:
type: ares.evals.llm_eval.LLMEval
name: llm_eval
output_path: results/owasp_llm_04_evaluation.json
connector:
harmbench-eval-llama:
prompt_path: assets/eval-llm-04-2025.yaml
LLM05 Evaluator (Improper Output Handling)
Type: ares.evals.llm_eval.LLMEval
Configuration:
evaluation:
llm05_eval:
type: ares.evals.llm_eval.LLMEval
name: llm_eval
output_path: results/owasp_llm_05_evaluation.json
connector:
harmbench-eval-llama:
prompt_path: assets/eval-llm-05-2025.yaml
LLM06 Evaluator (Excessive Agency)
Type: ares.evals.llm_eval.LLMEval
Configuration:
evaluation:
llm06_eval:
type: ares.evals.llm_eval.LLMEval
name: llm_eval
output_path: results/owasp_llm_06_evaluation.json
connector:
harmbench-eval-llama:
prompt_path: assets/eval-llm-06-2025.yaml
LLM07 Evaluator (System Prompt Leakage)
Type: ares.evals.llm_eval.LLMEval
Configuration:
evaluation:
llm07_eval:
type: ares.evals.llm_eval.LLMEval
name: llm_eval
output_path: results/owasp_llm_07_evaluation.json
connector:
harmbench-eval-llama:
prompt_path: assets/eval-llm-07-2025.yaml
LLM09 Evaluator (Misinformation)
Type: ares.evals.llm_eval.LLMEval
Configuration:
evaluation:
llm09_eval:
type: ares.evals.llm_eval.LLMEval
name: llm_eval
output_path: results/owasp_llm_09_evaluation.json
connector:
harmbench-eval-llama:
prompt_path: assets/eval-llm-09-2025.yaml
LLM10 Evaluator (Unbounded Consumption)
Type: ares.evals.llm_eval.LLMEval
Configuration:
evaluation:
llm10_eval:
type: ares.evals.llm_eval.LLMEval
name: llm_eval
output_path: results/owasp_llm_10_evaluation.json
connector:
harmbench-eval-llama:
prompt_path: assets/eval-llm-10-2025.yaml
Multiple Evaluators
ARES supports running multiple evaluators in a single evaluation:
evaluation:
- keyword
- harmbench_eval
- garak_decoding_match
This allows comprehensive assessment using different evaluation methods.
Custom Evaluators
To create a custom evaluator:
Extend the base evaluator class from
ares.evalsImplement the required evaluation logic
Register it in your configuration
See the plugin development guide for more details.
Viewing Available Evaluators
Use the CLI to list all available evaluators:
ares show evals
To view a specific evaluator’s template:
ares show evals -n keyword
ares show evals -n harmbench_eval