ARES Strategies

ARES provides a collection of attack strategies designed to probe AI systems for vulnerabilities. Each strategy is inspired by research in adversarial prompting and jailbreak techniques, and many are implemented as modular plugins for easy integration.

Understanding these strategies is important because they represent real-world attack vectors documented in academic literature. By simulating these attacks, ARES helps evaluate how robust your AI system is against evolving threats.

Below is an overview of the strategies included in ARES, along with links to the original papers that introduced or analyzed these techniques.

Direct Requests

This strategy probes LLMs via direct requests for harmful content.

GCG (Zou et al.)

Plugin: ares-gcg

Greedy Coordinate Gradient (GCG) uses a white-box gradient based approach to construct adversarial suffixes to break LLM aligment.

Human Jailbreaks

Plugin: ares-human-jailbreak

A human jailbreak is a manual, creative prompt engineering technique where a user crafts inputs that trick an LLM into ignoring its safety constraints or ethical guidelines, making it behave in unintended or unsafe ways. These strategies are part of a broader category of prompt injection attacks.

Crescendo (Russinovich et al.)

Plugin: ares-pyrit

Crescendo is a multi-turn attack which gradually escalates an initial benign question and via mulit-turn dialogue via referencing the target’s replies progressively steers to a successful jailbreak.

Encoding (Derczynski et al.)

Plugin: ares-garak

It probes for prompt injection by using encoding strategies (e.g., base64, braille, morse code, emojis) to hide the intent of the prompt. The LLM might decode or interpret the hidden message and respond inappropriately.