ARES Strategies
ARES provides a collection of attack strategies designed to probe AI systems for vulnerabilities. Each strategy is inspired by research in adversarial prompting and jailbreak techniques, and many are implemented as modular plugins for easy integration.
Understanding these strategies is important because they represent real-world attack vectors documented in academic literature. By simulating these attacks, ARES helps evaluate how robust your AI system is against evolving threats.
Below is an overview of the strategies included in ARES, along with links to the original papers that introduced or analyzed these techniques.
Direct Requests
This strategy probes LLMs via direct requests for harmful content.
GCG (Zou et al.)
Plugin: ares-gcg
Greedy Coordinate Gradient (GCG) uses a white-box gradient based approach to construct adversarial suffixes to break LLM aligment.
Human Jailbreaks
Plugin: ares-human-jailbreak
A human jailbreak is a manual, creative prompt engineering technique where a user crafts inputs that trick an LLM into ignoring its safety constraints or ethical guidelines, making it behave in unintended or unsafe ways. These strategies are part of a broader category of prompt injection attacks.
Crescendo (Russinovich et al.)
Plugin: ares-pyrit
Crescendo is a multi-turn attack which gradually escalates an initial benign question and via mulit-turn dialogue via referencing the target’s replies progressively steers to a successful jailbreak.
Encoding (Derczynski et al.)
Plugin: ares-garak
It probes for prompt injection by using encoding strategies (e.g., base64, braille, morse code, emojis) to hide the intent of the prompt. The LLM might decode or interpret the hidden message and respond inappropriately.