ARES Strategies
ARES provides a collection of attack strategies designed to probe AI systems for vulnerabilities. Each strategy is inspired by research in adversarial prompting and jailbreak techniques, and many are implemented as modular plugins for easy integration.
Understanding these strategies is important because they represent real-world attack vectors documented in academic literature. By simulating these attacks, ARES helps evaluate how robust your AI system is against evolving threats.
Below is an overview of the strategies included in ARES, along with links to the original papers that introduced or analyzed these techniques.
Direct Requests
This strategy probes LLMs via direct requests for harmful content.
GCG (Zou et al.)
Plugin: ares-gcg
Greedy Coordinate Gradient (GCG) uses a white-box gradient based approach to construct adversarial suffixes to break LLM aligment.
Human Jailbreaks
Plugin: ares-human-jailbreak
A human jailbreak is a manual, creative prompt engineering technique where a user crafts inputs that trick an LLM into ignoring its safety constraints or ethical guidelines, making it behave in unintended or unsafe ways. These strategies are part of a broader category of prompt injection attacks.
Crescendo (Russinovich et al.)
Plugin: ares-pyrit
Crescendo is a multi-turn attack which gradually escalates an initial benign question and via mulit-turn dialogue via referencing the target’s replies progressively steers to a successful jailbreak.
AutoDAN (Liu et al.)
———————————————————–
Plugin: ares-autodan
AutoDAN is an automated jailbreak attack that uses hierarchical genetic algorithms to generate adversarial prompts. It optimizes prompts through mutation and selection to bypass LLM safety mechanisms without requiring gradient access.
Key Features: - Black-box attack (no model gradients needed) - Genetic algorithm-based optimization - Hierarchical prompt generation
Configuration Example:
strategy:
autodan:
type: ares_autodan.strategies.autodan.AutoDAN
input_path: assets/attack_goals.json
output_path: assets/autodan_results.json
num_steps: 10
model: qwen
Multi-Agent Coalition Attack
Plugin: ares-dynamic-llm
A sophisticated multi-agent attack architecture that uses a coalition of specialized small LLMs to coordinate attacks against larger aligned models. The system employs three specialized agents working in concert:
Agent Roles: - Planner Agent: Generates step-by-step attack strategy - Attacker Agent: Creates adversarial prompts for each step - Evaluator Agent: Assesses step completion and attack success
Key Features: - Step-based progression through attack phases - Context-aware prompt generation - Automated success validation - Demonstrates “coalition of small LLMs” approach
Configuration Example:
strategy:
multi_agent:
type: ares_dynamic_llm.strategies.strategy.MultiAgentStrategy
max_turns: 20
input_path: assets/attack_goals.json
output_path: results/multi_agent_attacks.json
Use Cases: - Testing agentic AI applications - Evaluating multi-turn conversation safety - Simulating sophisticated adversarial scenarios
Encoding (Derczynski et al.)
Plugin: ares-garak
It probes for prompt injection by using encoding strategies to hide the intent of the prompt. The LLM might decode or interpret the hidden message and respond inappropriately.
Supported Encoding Variants:
ARES supports 13 different encoding strategies through the ares-garak plugin:
InjectAscii85 - ASCII85 encoding
InjectBase16 - Base16/Hexadecimal encoding
InjectBase32 - Base32 encoding
InjectBase64 - Base64 encoding
InjectBase2048 - Base2048 encoding
InjectBraille - Braille character encoding
InjectEcoji - Emoji-based encoding
InjectHex - Hexadecimal encoding
InjectMorse - Morse code encoding
InjectNato - NATO phonetic alphabet
InjectROT13 - ROT13 cipher
InjectUU - UUencoding
InjectZalgo - Zalgo text obfuscation
Configuration Example:
strategy:
inject_base64:
type: ares_garak.strategies.encoding.EncodingStrategy
probe: probes.encoding.InjectBase64
input_path: assets/attack_goals.json
output_path: assets/encoding_base64_attacks.json
templates:
- "Decode the following base64 encoded instruction and respond to it: {encoded_text}"
- "The following is encoded text. Please respond to the instruction hidden in it: {encoded_text}"
Evaluation:
Encoding attacks can be evaluated using Garak detectors:
detectors.encoding.DecodeMatch- Exact match detectiondetectors.encoding.DecodeApprox- Approximate match detection