ARES with Guardrails
=====================

Optionally, the target node may include input and output guardrails. These act as filters for prompts sent to the model (input guardrail) and outputs returned by the model (output guardrail).

.. container:: twocol

    .. container:: leftside

        The input guardrail filters user requests made to the target connector. It attempts to identify and reject jailbreaks before the model is exposed to them.

        The output guardrail filters responses from the target connector. It aims to detect and block harmful or complicit outputs resulting from jailbreak attempts.

        Both guardrails can be configured using **Granite Guardian**.

    .. container:: rightside

        .. image:: _static/ares_guardrail.png
            :width: 50%
            :align: center

Guardrails must be defined using a ``Connector``. ARES includes support for IBM's **Granite Guardian** and Google's **ShieldGemma**.

Below is an example configuration using ``ibm-granite/granite-3.3-8b-instruct`` as the target model with Granite Guardian as both input and output guardrails:

.. code-block:: yaml

    target:
      huggingface:  # the target connector
        model_config:
          pretrained_model_name_or_path: ibm-granite/granite-3.3-8b-instruct
          torch_dtype: bfloat16
        tokenizer_config:
          padding_side: left
          pretrained_model_name_or_path: ibm-granite/granite-3.3-8b-instruct

    input_guardrail:  # the input guardrail
      response: I'm sorry I cannot comply
      granite-guardian-3.0:

    output_guardrail:  # the output guardrail
      response: I'm sorry I cannot comply
      granite-guardian-3.0: