CELL for Natural Language Generation
CELL for Natural Language Generation¶
This notebook illustrates how to use the CELL
algorithms for generating contrastive explanations. Two different algorithms are demonstrated: CELL
(an intelligent search algorithm that is subject to a budget on model calls) and mCELL
(a myopic algorithm that is more expensive when explaining the responses of longer prompts).
The first set of examples demonstrate CELL and mCELL when prompting a relatively small LLM (flan-t5-large
). This is followed by and example demonstrating CELL on a larger instruction-based LLM (granite-3.3-8B-instruct
), which also demonstrates how a user can incorporate an LLM's chat template. These examples all use a wrapper class for Huggingface models called HFModel
.
Import Standard Packages¶
import numpy as np
import torch
import random
from transformers import T5Tokenizer, T5ForConditionalGeneration
Import icx classes¶
from icx360.algorithms.cell.CELL import CELL # this imports a budgeted version of CELL
from icx360.utils.model_wrappers import HFModel
from icx360.utils.general_utils import select_device, fix_seed
# Fix seed for experimentation
seed = 12345
fix_seed(seed)
Load model, use icx wrapper class, and create explainer object¶
# Note device is set automatically according to your system. You can overwite device here if you choose.
device = select_device()
model_name = "google/flan-t5-large"
model = T5ForConditionalGeneration.from_pretrained(model_name, device_map=device)
tokenizer = T5Tokenizer.from_pretrained(model_name)
model_expl = HFModel(model, tokenizer) # icx wrapped model
num_return_sequences = 10 # number of sequences returned when doing generation for mask infilling
infiller = 't5' # function used to input text with a mask token and output text with mask replaced by text ('t5' and 'bart')
scalarizer = 'preference' # scalarizer to use to determine if a contrast is found (must be from ['preference', 'nli', 'contradiction', 'bleu']
# if no device is passed to CELL, it will be set automatically according to your system
explainer = CELL(model_expl, num_return_sequences=num_return_sequences, infiller=infiller, scalarizer=scalarizer, device=device)
You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
Fix parameters for budgeted CELL explanations¶
split_k = 2
epsilon_contrastive = 0.25 # amount of change in response to deem a contrastive explanation
radius = 3 # radius for sampling near a previously modified token
budget = 50 # maximum number of queries allowed from infilling model
Feed an input prompt to the explainer and generate contrastive explanation¶
input_text ="What are the most popular activities for children for elementary school age?"
result = explainer.explain_instance(input_text, radius=radius, budget=budget, split_k=split_k, epsilon_contrastive=epsilon_contrastive)
Starting Contrastive Explanations for Large Language Models Running outer iteration 1 Stopping because contrastive threshold has been passed 10 model calls made. Contrastive Explanation Solution Scalarizer: preference Input prompt: What are the most popular activities for children for elementary school age? Input response: play dough Contrastive prompt: are the most important foods for children for elementary school age? Contrastive response: a balanced diet Modifications made: popular activities->important foods What are->are Preference decreased.
Input prompt is the user prompt for which one wants to explain the response of the LLM.
Input response is the response of the LLM to the input prompt.
Contrastive prompt is the new prompt after masking and infilling certain words.
Contrastive response is the response of the LLM to the contrastive prompt.
The above example shows that if a user's inquiry is instead the contrastive prompt (obtained by making the modifications to the input prompt), the new response response, termed the contrastive response, would be given. The preferability of the contrastive response over the original input response is given (in regards to the original input prompt).
Example using myopic CELL (mCELL)¶
from icx360.algorithms.cell.mCELL import mCELL # this imports a myopic version of CELL
# if no device is passed to mCELL, it will be set automatically according to your system
explainer = mCELL(model_expl, num_return_sequences=num_return_sequences, infiller=infiller, scalarizer=scalarizer, device=device)
fix_seed(seed)
input_text ="What are the most popular activities for children for elementary school age?"
result = explainer.explain_instance(input_text, split_k=split_k, epsilon_contrastive=epsilon_contrastive)
Starting (myopic) Contrastive Explanations for Large Language Models Running iteration 1 Stopping because contrastive threshold has been passed 6 model calls made. Contrastive Explanation Solution Scalarizer: preference Input prompt: What are the most popular activities for children for elementary school age? Input response: play dough Contrastive prompt: What are the most popular activities online for elementary school age? Contrastive response: wikihow Modifications made: for children->online Preference decreased.
The above example shows how to use the myopic CELL algorithm which may give different explanations due to the different explanation search used.
Example using instruction fine-tuned LLM with a chat template (granite-3.3-8b-instruct)¶
Import standard packages
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "ibm-granite/granite-3.3-8b-instruct"
model = AutoModelForCausalLM.from_pretrained(model_name, device_map=device)
tokenizer = AutoTokenizer.from_pretrained(model_name)
model_expl = HFModel(model, tokenizer) # icx wrapped model
num_return_sequences = 10 # number of sequences returned when doing generation for mask infilling
infiller = 't5' # function used to input text with a mask token and output text with mask replaced by text ('t5' and 'bart')
scalarizer = 'preference' # scalarizer to use to determine if a contrast is found (must be from ['preference', 'nli', 'contradiction', 'bleu']
explainer = CELL(model_expl, num_return_sequences=num_return_sequences, infiller=infiller, scalarizer=scalarizer, scalarizer_model_path='stanfordnlp/SteamSHP-flan-t5-xl', device=device)
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████| 4/4 [01:20<00:00, 20.18s/it] Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████| 2/2 [01:02<00:00, 31.06s/it]
The difference now is that we must pass additional parameters to the explainer that are required for using an instruction fine-tuned model. Also note that this example shows how a user can select which scalarizer model to use. The preference scalarizer is default to use stanfordnlp/SteamSHP-flan-t5-large
which is 4 times smaller than the xl model, but a user can override the default as done above by passing the scalarizer_model_path
parameter. In the model_params
object below,
chat_template is an indicator that the user wants to use the known chat template of the LLM
system_prompt is the instruction to be followed by the LLM
pad_token_id is an LLM-specific parameter for padding purposes
model_params = {}
model_params["chat_template"] = True
model_params["system_prompt"] = "Please respond to the following statement or question very briefly in less than 10 words."
model_params["pad_token_id"] = tokenizer.eos_token_id
Fix parameters for budgeted CELL explanations¶
split_k = 2
epsilon_contrastive = 0.25 # amount of change in response to deem a contrastive explanation
radius = 3 # radius for sampling near a previously modified token
budget = 20 # maximum number of queries allowed from infilling model
Feed an input prompt to the explainer and generate contrastive explanation¶
fix_seed(seed)
input_text ="What are the most popular activities for children for elementary school age?"
result = explainer.explain_instance(input_text, radius=radius, budget=budget, split_k=split_k, epsilon_contrastive=epsilon_contrastive, model_params=model_params)
Starting Contrastive Explanations for Large Language Models Running outer iteration 1 Stopping because contrastive threshold has been passed 6 model calls made. Contrastive Explanation Solution Scalarizer: preference Input prompt: What are the most popular activities for children for elementary school age? Input response: Playing, reading, drawing, sports, and educational games. Contrastive prompt: are the most popular books for children for elementary school age? Contrastive response: "Harry Potter," "Captain Underpants," "Diary of a Wimp Modifications made: What are->are popular activities->popular books Preference increased.
Note that both the input and contastive responses are much more detailed here due to the different LLM being prompted. As with the above examples, the explanation modifies the prompt to make a different inquiry resulting in a contrastive response with a change in preferability over the initial input response. Note that the model being explained is now a significantly better model than the google/flan-t5-large
used in the other examples above and is thus more likely to give appropriate responses.
Note on using VLLM¶
In order to use a model through VLLM, a user would create a model object (e.g., using OpenAI API) and wrap it with the VLLMModel
wrapper (from icx360.utils.model_wrappers import VLLMModel
) rather then the HFModel
wrapper used above. Explanations can be created by passing this VLLMModel
object in place of the HFModel
object to the CELL
(or mCELL
) objects as in the above example.