CELL for Natural Language Generation

CELL for Natural Language Generation¶

This notebook illustrates how to use the CELL algorithms for generating contrastive explanations. Two different algorithms are demonstrated: CELL (an intelligent search algorithm that is subject to a budget on model calls) and mCELL (a myopic algorithm that is more expensive when explaining the responses of longer prompts).

The first set of examples demonstrate CELL and mCELL when prompting a relatively small LLM (flan-t5-large). This is followed by and example demonstrating CELL on a larger instruction-based LLM (granite-3.3-8B-instruct), which also demonstrates how a user can incorporate an LLM's chat template. These examples all use a wrapper class for Huggingface models called HFModel.

Import Standard Packages¶

In [ ]:

Copied!





import numpy as np
import torch
import random
from transformers import T5Tokenizer, T5ForConditionalGeneration
import numpy as np
import torch
import random
from transformers import T5Tokenizer, T5ForConditionalGeneration

Import icx classes¶

In [2]:

Copied!

from icx360.algorithms.cell.CELL import CELL # this imports a budgeted version of CELL
from icx360.utils.model_wrappers import HFModel
from icx360.utils.general_utils import select_device, fix_seed 
from icx360.algorithms.cell.CELL import CELL # this imports a budgeted version of CELL
from icx360.utils.model_wrappers import HFModel
from icx360.utils.general_utils import select_device, fix_seed

In [3]:

Copied!

# Fix seed for experimentation
seed = 12345
fix_seed(seed)
# Fix seed for experimentation
seed = 12345
fix_seed(seed)

Load model, use icx wrapper class, and create explainer object¶

In [4]:

Copied!





# Note device is set automatically according to your system. You can overwite device here if you choose.
device = select_device()
model_name = "google/flan-t5-large"
model = T5ForConditionalGeneration.from_pretrained(model_name, device_map=device)
tokenizer = T5Tokenizer.from_pretrained(model_name)

model_expl = HFModel(model, tokenizer) # icx wrapped model
num_return_sequences = 10 # number of sequences returned when doing generation for mask infilling
infiller = 't5' # function used to input text with a mask token and output text with mask replaced by text ('t5' and 'bart')
scalarizer = 'preference' # scalarizer to use to determine if a contrast is found (must be from ['preference', 'nli', 'contradiction', 'bleu']
# if no device is passed to CELL, it will be set automatically according to your system
explainer = CELL(model_expl, num_return_sequences=num_return_sequences, infiller=infiller, scalarizer=scalarizer, device=device) 
# Note device is set automatically according to your system. You can overwite device here if you choose.
device = select_device()
model_name = "google/flan-t5-large"
model = T5ForConditionalGeneration.from_pretrained(model_name, device_map=device)
tokenizer = T5Tokenizer.from_pretrained(model_name)

model_expl = HFModel(model, tokenizer) # icx wrapped model
num_return_sequences = 10 # number of sequences returned when doing generation for mask infilling
infiller = 't5' # function used to input text with a mask token and output text with mask replaced by text ('t5' and 'bart')
scalarizer = 'preference' # scalarizer to use to determine if a contrast is found (must be from ['preference', 'nli', 'contradiction', 'bleu']
# if no device is passed to CELL, it will be set automatically according to your system
explainer = CELL(model_expl, num_return_sequences=num_return_sequences, infiller=infiller, scalarizer=scalarizer, device=device) 

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565

Fix parameters for budgeted CELL explanations¶

In [5]:

Copied!





split_k = 2 
epsilon_contrastive = 0.25 # amount of change in response to deem a contrastive explanation
radius = 3 # radius for sampling near a previously modified token 
budget = 50 # maximum number of queries allowed from infilling model
split_k = 2 
epsilon_contrastive = 0.25 # amount of change in response to deem a contrastive explanation
radius = 3 # radius for sampling near a previously modified token 
budget = 50 # maximum number of queries allowed from infilling model

Feed an input prompt to the explainer and generate contrastive explanation¶

In [6]:

Copied!

input_text ="What are the most popular activities for children for elementary school age?"
result = explainer.explain_instance(input_text, radius=radius, budget=budget, split_k=split_k, epsilon_contrastive=epsilon_contrastive)
input_text ="What are the most popular activities for children for elementary school age?"
result = explainer.explain_instance(input_text, radius=radius, budget=budget, split_k=split_k, epsilon_contrastive=epsilon_contrastive)

Starting Contrastive Explanations for Large Language Models
Running outer iteration 1
Stopping because contrastive threshold has been passed
10 model calls made.
Contrastive Explanation Solution
Scalarizer: preference
Input prompt: What are the most popular activities for children for elementary school age?
Input response: play dough
Contrastive prompt: are the most important foods for children for elementary school age?
Contrastive response: a balanced diet
Modifications made: 
        popular activities->important foods
        What are->are
Preference decreased.

Input prompt is the user prompt for which one wants to explain the response of the LLM.
Input response is the response of the LLM to the input prompt.
Contrastive prompt is the new prompt after masking and infilling certain words.
Contrastive response is the response of the LLM to the contrastive prompt.

The above example shows that if a user's inquiry is instead the contrastive prompt (obtained by making the modifications to the input prompt), the new response response, termed the contrastive response, would be given. The preferability of the contrastive response over the original input response is given (in regards to the original input prompt).

Example using myopic CELL (mCELL)¶

In [7]:

Copied!





from icx360.algorithms.cell.mCELL import mCELL # this imports a myopic version of CELL
# if no device is passed to mCELL, it will be set automatically according to your system
explainer = mCELL(model_expl, num_return_sequences=num_return_sequences, infiller=infiller, scalarizer=scalarizer, device=device)

fix_seed(seed)
input_text ="What are the most popular activities for children for elementary school age?"
result = explainer.explain_instance(input_text, split_k=split_k, epsilon_contrastive=epsilon_contrastive)
from icx360.algorithms.cell.mCELL import mCELL # this imports a myopic version of CELL
# if no device is passed to mCELL, it will be set automatically according to your system
explainer = mCELL(model_expl, num_return_sequences=num_return_sequences, infiller=infiller, scalarizer=scalarizer, device=device)

fix_seed(seed)
input_text ="What are the most popular activities for children for elementary school age?"
result = explainer.explain_instance(input_text, split_k=split_k, epsilon_contrastive=epsilon_contrastive)

Starting (myopic) Contrastive Explanations for Large Language Models
Running iteration 1
Stopping because contrastive threshold has been passed
6 model calls made.
Contrastive Explanation Solution
Scalarizer: preference
Input prompt: What are the most popular activities for children for elementary school age?
Input response: play dough
Contrastive prompt: What are the most popular activities online for elementary school age?
Contrastive response: wikihow
Modifications made: 
        for children->online
Preference decreased.

The above example shows how to use the myopic CELL algorithm which may give different explanations due to the different explanation search used.

Example using instruction fine-tuned LLM with a chat template (granite-3.3-8b-instruct)¶

Import standard packages

In [8]:

Copied!





from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "ibm-granite/granite-3.3-8b-instruct"
model = AutoModelForCausalLM.from_pretrained(model_name, device_map=device)
tokenizer = AutoTokenizer.from_pretrained(model_name)

model_expl = HFModel(model, tokenizer) # icx wrapped model

num_return_sequences = 10 # number of sequences returned when doing generation for mask infilling
infiller = 't5' # function used to input text with a mask token and output text with mask replaced by text ('t5' and 'bart')
scalarizer = 'preference' # scalarizer to use to determine if a contrast is found (must be from ['preference', 'nli', 'contradiction', 'bleu']
explainer = CELL(model_expl, num_return_sequences=num_return_sequences, infiller=infiller, scalarizer=scalarizer, scalarizer_model_path='stanfordnlp/SteamSHP-flan-t5-xl', device=device)
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "ibm-granite/granite-3.3-8b-instruct"
model = AutoModelForCausalLM.from_pretrained(model_name, device_map=device)
tokenizer = AutoTokenizer.from_pretrained(model_name)

model_expl = HFModel(model, tokenizer) # icx wrapped model

num_return_sequences = 10 # number of sequences returned when doing generation for mask infilling
infiller = 't5' # function used to input text with a mask token and output text with mask replaced by text ('t5' and 'bart')
scalarizer = 'preference' # scalarizer to use to determine if a contrast is found (must be from ['preference', 'nli', 'contradiction', 'bleu']
explainer = CELL(model_expl, num_return_sequences=num_return_sequences, infiller=infiller, scalarizer=scalarizer, scalarizer_model_path='stanfordnlp/SteamSHP-flan-t5-xl', device=device)

Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████| 4/4 [01:20<00:00, 20.18s/it]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████| 2/2 [01:02<00:00, 31.06s/it]

The difference now is that we must pass additional parameters to the explainer that are required for using an instruction fine-tuned model. Also note that this example shows how a user can select which scalarizer model to use. The preference scalarizer is default to use stanfordnlp/SteamSHP-flan-t5-large which is 4 times smaller than the xl model, but a user can override the default as done above by passing the scalarizer_model_path parameter. In the model_params object below,

chat_template is an indicator that the user wants to use the known chat template of the LLM
system_prompt is the instruction to be followed by the LLM
pad_token_id is an LLM-specific parameter for padding purposes

In [9]:

Copied!





model_params = {}
model_params["chat_template"] = True
model_params["system_prompt"] = "Please respond to the following statement or question very briefly in less than 10 words." 
model_params["pad_token_id"] = tokenizer.eos_token_id
model_params = {}
model_params["chat_template"] = True
model_params["system_prompt"] = "Please respond to the following statement or question very briefly in less than 10 words." 
model_params["pad_token_id"] = tokenizer.eos_token_id

Fix parameters for budgeted CELL explanations¶

In [10]:

Copied!





split_k = 2 
epsilon_contrastive = 0.25 # amount of change in response to deem a contrastive explanation
radius = 3 # radius for sampling near a previously modified token 
budget = 20 # maximum number of queries allowed from infilling model
split_k = 2 
epsilon_contrastive = 0.25 # amount of change in response to deem a contrastive explanation
radius = 3 # radius for sampling near a previously modified token 
budget = 20 # maximum number of queries allowed from infilling model

Feed an input prompt to the explainer and generate contrastive explanation¶

In [11]:

Copied!

fix_seed(seed)
input_text ="What are the most popular activities for children for elementary school age?"
result = explainer.explain_instance(input_text, radius=radius, budget=budget, split_k=split_k, epsilon_contrastive=epsilon_contrastive, model_params=model_params)
fix_seed(seed)
input_text ="What are the most popular activities for children for elementary school age?"
result = explainer.explain_instance(input_text, radius=radius, budget=budget, split_k=split_k, epsilon_contrastive=epsilon_contrastive, model_params=model_params)

Starting Contrastive Explanations for Large Language Models
Running outer iteration 1
Stopping because contrastive threshold has been passed
6 model calls made.
Contrastive Explanation Solution
Scalarizer: preference
Input prompt: What are the most popular activities for children for elementary school age?
Input response: Playing, reading, drawing, sports, and educational games.
Contrastive prompt: are the most popular books for children for elementary school age?
Contrastive response: "Harry Potter," "Captain Underpants," "Diary of a Wimp
Modifications made: 
        What are->are
        popular activities->popular books
Preference increased.

Note that both the input and contastive responses are much more detailed here due to the different LLM being prompted. As with the above examples, the explanation modifies the prompt to make a different inquiry resulting in a contrastive response with a change in preferability over the initial input response. Note that the model being explained is now a significantly better model than the google/flan-t5-large used in the other examples above and is thus more likely to give appropriate responses.

Note on using VLLM¶

In order to use a model through VLLM, a user would create a model object (e.g., using OpenAI API) and wrap it with the VLLMModel wrapper (from icx360.utils.model_wrappers import VLLMModel) rather then the HFModel wrapper used above. Explanations can be created by passing this VLLMModel object in place of the HFModel object to the CELL (or mCELL) objects as in the above example.