MExGen for Summarization¶
This notebook walks through an example of using MExGen (Multi-Level Explanations for Generative Language Models) to explain an LLM's summarization of a document.
After setting things up in Section 1, we will obtain explanations in the form of sentence-level attributions to the input document in Section 2, followed by mixed phrase- and sentence-level attributions in Section 3. We will then evaluate the fidelity of these explanations to the LLM in Section 4.
1. Setup¶
Import packages¶
Standard packages
from datasets import load_dataset # for XSum dataset
import matplotlib.pyplot as plt # for plotting perturbation curves
import numpy as np
from openai import OpenAI # for VLLM summarization model
import pandas as pd # only for displaying DataFrames
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BartForConditionalGeneration, BartTokenizerFast # for HuggingFace summarization models
ICX360 classes
from icx360.algorithms.mexgen import CLIME, LSHAP # explainers
from icx360.metrics import PerturbCurveEvaluator # fidelity evaluation
from icx360.utils.general_utils import select_device # set device automatically
from icx360.utils.model_wrappers import HFModel, VLLMModel # model wrappers
device = select_device()
device
device(type='cuda')
Load model to explain¶
Here you can choose from the following models:
"distilbart"
: A small summarization model from HuggingFace"granite-hf"
: A larger model from HuggingFace"vllm"
: A model served using VLLM. This is a "bring your own model" option, for which you will have to supply the parameters below (model_name
,base_url
,api_key
, and any others).
model_type = "distilbart"
# model_type = "granite-hf"
# model_type = "vllm"
if model_type == "distilbart":
model_name = "sshleifer/distilbart-xsum-12-6"
model = BartForConditionalGeneration.from_pretrained(model_name).to(device)
tokenizer = BartTokenizerFast.from_pretrained(model_name, add_prefix_space=True)
elif model_type == "granite-hf":
model_name = "ibm-granite/granite-3.3-2b-instruct"
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16).to(device)
tokenizer = AutoTokenizer.from_pretrained(model_name, add_prefix_space=True)
elif model_type == "vllm":
# IF YOU HAVE A VLLM MODEL, UNCOMMENT AND REPLACE THE FOLLOWING LINES WITH YOUR MODEL'S PARAMETERS
# base_url = "https://YOUR/MODEL/URL"
# api_key = YOUR_API_KEY
# openai_kwargs = {}
model = OpenAI(api_key=api_key, base_url=base_url, **openai_kwargs)
# Corresponding HuggingFace tokenizer for applying chat template
# model_name = "YOUR/MODEL-NAME"
# tokenizer_kwargs = {}
tokenizer = AutoTokenizer.from_pretrained(model_name, **tokenizer_kwargs)
else:
raise ValueError("Unknown model type")
We then wrap the model with a common API (HFModel
or VLLMModel
) that the explainer will use.
if model_type in ("distilbart", "granite-hf"):
wrapped_model = HFModel(model, tokenizer)
elif model_type == "vllm":
wrapped_model = VLLMModel(model, model_name, tokenizer)
Load input¶
Load the Extreme Summarization (XSum) dataset
#dataset = load_dataset('xsum', split='train', trust_remote_code=True)
dataset = load_dataset('xsum', split='test', trust_remote_code=True)
For this example, we will find a news article about the clothing retailer Inditex. This can be modified to load a different article.
for document in dataset["document"]:
if "The world's biggest clothing retailer" in document:
break
print(document)
The world's biggest clothing retailer posted net earnings of €1.26bn (£1.1bn) in the six months to 31 July - up 8% on the same period last year. Sales jumped from €9.4bn to €10.5bn, an increase of 11%. The group's clothes can now be bought online in around 40 countries, it said. Inditex operates eight brands in 90 countries including Pull&Bear, Massimo Dutti and Bershka. How Zara's founder became the richest man in the world - for two days Chairman and chief executive Pablo Isla emphasised the firm's investment in technology, saying the firm had expanded its online stores to 11 new countries in the period. It also launched mobile phone payment in all its Spanish stores, with the objective of "extending the service to other countries". This will encompass online apps for all of its brands and a specific app for the whole group called InWallet. Mr Isla said: "Both our online and bricks-and-mortar stores are seamlessly connected, driven by platforms such as mobile payment, and other technological initiatives that we will continue to develop." Tom Gadsby, an analyst at Liberum, said the firm's "online drive" was important. "I expect over the years they may find they don't have to open as many stores to maintain their strong growth rate as the online channel will become increasingly important," he said. "And while Zara is available in many of the territories in which they operate [online], most of their other brands aren't readily available outside Europe online. "So there is a big opportunity there for them to expand online into new territories." The company also said it had benefited from steady economic growth in Spain, where Inditex gets about a fifth of its sales. That country's clothing market grew at an average of 3% in the three-months to the end of July, according to the Spanish statistics agency. All of the group's brands increased their international presence during the period, with 83 new stores opened in 38 countries. In a call with analysts, it said it would open 6-8% of new store space over course of the year. The firm's strong performance sets it apart from European rivals H&M and Next, which have blamed unseasonal weather for below-forecast results this year.
Generate model response¶
As a check on our setup, we will have the model generate its summary of the input document, via the wrapped_model
object created above.
First we specify parameters for model generation, as a dictionary model_params
. These parameters include max_new_tokens
/max_tokens
, whether to use the model's chat template, and any instruction provided as a system prompt (the DistilBART model does not need an instruction to summarize).
model_params = {}
if model_type == "vllm":
model_params["max_tokens"] = 100
model_params["seed"] = 20250430
else:
model_params["max_new_tokens"] = 100
if model_type in ("granite-hf", "vllm"):
model_params["chat_template"] = True
model_params["system_prompt"] = "Summarize the following article in one sentence. Do not preface the summary with anything."
model_params
{'max_new_tokens': 100}
Now we generate the summary:
output_orig = wrapped_model.generate(document, **model_params)
print(output_orig)
['Inditex, the owner of Zara and Massimo Dutti, has reported a sharp rise in half-year profits as it continues to expand its online presence.']
2. Sentence-Level Explanation¶
Instantiate explainer¶
Here you can choose between two attribution algorithms used by MExGen, C-LIME and L-SHAP. These are more efficient variants of LIME and SHAP respectively. In either case, the explanation takes the form of importance scores assigned to parts of the input document, and these scores are computed by calling the summarization model on perturbed versions of the input.
# explainer_alg = "clime"
explainer_alg = "lshap"
if explainer_alg == "clime":
explainer_class = CLIME
elif explainer_alg == "lshap":
explainer_class = LSHAP
The primary parameter for the explainer is the "scalarizer", which quantifies how different are the output summaries for perturbed inputs from the output summary for the original input. For this we will use "text-only" scalarizers (scalarizer="text"
), which compute different similarity scores between the original summary and the perturbed summaries, thus providing different views of what constitutes "similarity". Small language models are used to provide these similarity scores. Specifically, we use an NLI model to compute both NLI scores and BERTScores, and a summarization model (same as the DistilBART model above) to compute "SUMM" scores and BARTScores.
model_nli_name = "microsoft/deberta-v2-xxlarge-mnli"
model_summ_name = "sshleifer/distilbart-xsum-12-6"
explainer = explainer_class(wrapped_model, scalarizer="text",
model_nli=model_nli_name, model_bert=model_nli_name,
model_summ=model_summ_name, model_bart=model_summ_name, device=device)
Call explainer¶
We call the explainer's explain_instance
method on the input document, with the model generation parameters in model_params
and default settings otherwise. This will segment the document into sentences and attribute an importance score to each sentence.
output_dict_sent = explainer.explain_instance(document, model_params=model_params)
toma_generate batch size = 132
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
['Inditex, the owner of Zara and Massimo Dutti, has reported a sharp rise in half-year profits, helped by a surge in sales.'] ['Inditex, the owner of Zara and Massimo Dutti, has reported a sharp rise in half-year profits, helped by a surge in sales.', 'Inditex, the owner of Zara and Massimo Dutti, has reported a sharp rise in half-year profits.', 'Inditex, the owner of Zara and Massimo Dutti, has reported a sharp rise in half-year profits.', 'Inditex, the owner of chains including Zara, Massimo Dutti and Pull&Bear, has reported a sharp rise in half-year profits.', 'Inditex, the owner of chains including Zara and Pull&Bear, has reported a sharp rise in half-year profits as it continues to expand its online presence.'] NLI scalarizer ref->gen toma_call batch size = 132 NLI scalarizer gen->ref toma_call batch size = 132 summ scalarizer
Passing a tuple of `past_key_values` is deprecated and will be removed in Transformers v4.58.0. You should pass an instance of `EncoderDecoderCache` instead, e.g. `past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)`.
toma_get_probs batch size = 132
Look at output¶
The explainer returns a dictionary. The "output_orig"
item shows the output summary for the original document.
output_dict_sent["output_orig"].output_text
['Inditex, the owner of Zara and Massimo Dutti, has reported a sharp rise in half-year profits, helped by a surge in sales.']
The "attributions"
item is itself a dictionary, containing the sentences ("units"
) that the document has been split into, the corresponding "unit_types"
, and the importance scores for the sentences, one score for each similarity metric included in the scalarizer (NLI, BERTScore, etc.). These are displayed below as a pandas DataFrame, where we also normalize each column of scores by the maximum score.
attrib_scores_df = pd.DataFrame(output_dict_sent["attributions"]).set_index("units")
score_labels = explainer.scalarized_model.sim_scores
attrib_scores_df = attrib_scores_df[["unit_types"] + score_labels]
attrib_scores_df[score_labels] /= attrib_scores_df[score_labels].max(axis=0)
attrib_scores_df
unit_types | nli_logit | bert | st | summ | bart | |
---|---|---|---|---|---|---|
units | ||||||
The world's biggest clothing retailer posted net earnings of €1.26bn (£1.1bn) in the six months to 31 July - up 8% on the same period last year. | s | 0.683280 | 1.000000 | 0.643647 | 1.000000 | 1.000000 |
\nSales jumped from €9.4bn to €10.5bn, an increase of 11%. | s | 0.729922 | 0.395132 | 0.669389 | 0.225816 | 0.260140 |
\nThe group's clothes can now be bought online in around 40 countries, it said. | s | 0.390864 | 0.042293 | -0.167252 | 0.047666 | 0.048265 |
\nInditex operates eight brands in 90 countries including Pull&Bear, Massimo Dutti and Bershka. | s | 0.600581 | 0.297317 | 0.265977 | 0.457829 | 0.377207 |
\nHow Zara's founder became the richest man in the world - for two days\nChairman and chief executive Pablo Isla emphasised the firm's investment in technology, saying the firm had expanded its online stores to 11 new countries in the period. | s | 0.707665 | 0.703973 | 1.000000 | 0.472878 | 0.584450 |
\nIt also launched mobile phone payment in all its Spanish stores, with the objective of "extending the service to other countries". | s | 0.645685 | 0.268591 | 0.255037 | 0.186310 | 0.205321 |
\nThis will encompass online apps for all of its brands and a specific app for the whole group called InWallet. | s | 0.548298 | 0.159034 | 0.105412 | 0.048554 | 0.039156 |
\nMr Isla said: "Both our online and bricks-and-mortar stores are seamlessly connected, driven by platforms such as mobile payment, and other technological initiatives that we will continue to develop." | s | 0.556377 | 0.211338 | 0.150247 | 0.104636 | 0.101121 |
\nTom Gadsby, an analyst at Liberum, said the firm's "online drive" was important. | s | 0.676632 | 0.300964 | 0.224815 | 0.336209 | 0.342782 |
\n"I expect over the years they may find they don't have to open as many stores to maintain their strong growth rate as the online channel will become increasingly important," he said. | s | 1.000000 | 0.329794 | 0.332405 | 0.303154 | 0.287829 |
\n"And while Zara is available in many of the territories in which they operate [online], most of their other brands aren't readily available outside Europe online. | s | 0.218182 | 0.039034 | 0.007694 | -0.009965 | -0.013622 |
\n"So there is a big opportunity there for them to expand online into new territories." | s | 0.188577 | 0.044537 | 0.007384 | -0.024480 | -0.018063 |
\nThe company also said it had benefited from steady economic growth in Spain, where Inditex gets about a fifth of its sales. | s | 0.760705 | 0.187366 | -0.026861 | 0.152611 | 0.146245 |
\nThat country's clothing market grew at an average of 3% in the three-months to the end of July, according to the Spanish statistics agency. | s | 0.923588 | 0.446365 | 0.601362 | 0.508205 | 0.462474 |
\nAll of the group's brands increased their international presence during the period, with 83 new stores opened in 38 countries. | s | 0.488930 | 0.038490 | -0.192519 | 0.031038 | 0.039291 |
\nIn a call with analysts, it said it would open 6-8% of new store space over course of the year. | s | 0.890768 | 0.493915 | 0.632328 | 0.366810 | 0.398132 |
\n | n | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
The firm's strong performance sets it apart from European rivals H&M and Next, which have blamed unseasonal weather for below-forecast results this year. | s | 0.361864 | -0.083896 | -0.156779 | -0.048950 | -0.049222 |
While the importance scores should align roughly with our human intuition (for example, sentences mentioning increases in earnings, sales, and online presence are important), we will defer to Section 4 the evaluation of how faithful they are to the summarization LLM.
3. Mixed Phrase- and Sentence-Level Explanation¶
We will now consider the multi-level aspect of MExGen by obtaining mixed phrase- and sentence-level attributions to the input document.
Set up parameters¶
For this illustration, we will segment the 2 most important sentences (as determined in the previous section) into phrases. (This number can be changed.) We will also measure importance by the sum of scores across the similarity metrics (a single similarity metric could be used too).
num_top_sent = 2
score_label = "sum"
The parameters for explain_instance()
will be as follows:
units
andunit_types
: Take existing sentence-level units and unit types fromoutput_dict_sent["attributions"]
ind_segment
: We create a Boolean array that has valueTrue
in positions corresponding to the top 2 sentences in terms of the sum of scores, andFalse
otherwise. This will tell the explainer to segment only these 2 sentences.segment_type = "ph"
for segmentation into phrasesmodel_params
as before
units = output_dict_sent["attributions"]["units"]
unit_types = output_dict_sent["attributions"]["unit_types"]
segment_type = "ph"
if score_label == "sum":
scores = attrib_scores_df[score_labels].sum(axis=1).values
else:
scores = attrib_scores_df[score_label].values
ind_segment = np.zeros_like(scores, dtype=bool)
ind_segment[np.argsort(scores)[-num_top_sent:]] = True
ind_segment
array([ True, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False])
Call explainer¶
Now we call explain_instance()
with the above parameters
output_dict_mixed = explainer.explain_instance(units, unit_types, ind_segment=ind_segment, segment_type=segment_type, model_params=model_params)
became advcl How Zara's founder became expanded ccomp had expanded its online stores toma_generate batch size = 258 ['Inditex, the owner of Zara and Massimo Dutti, has reported a sharp rise in half-year profits.'] ['Inditex, the owner of Zara and Massimo Dutti, has reported a sharp rise in half-year profits.', 'Inditex, the owner of chains including Zara, Pull&Bear and Massimo Dutti, has reported a sharp rise in profits.', 'Inditex, the owner of chains including Zara, Massimo Dutti and Pull&Bear, has reported a sharp rise in profits.', 'Inditex, the owner of chains including Zara, Pull&Bear and Massimo Dutti, has reported a sharp rise in half-year profits.', 'Inditex, the owner of chains including Zara, Massimo Dutti and Pull&Bear, has reported a sharp rise in profits.'] NLI scalarizer ref->gen toma_call batch size = 258 NLI scalarizer gen->ref toma_call batch size = 258 summ scalarizer toma_get_probs batch size = 258
Look at output¶
Output summary for the original document
output_dict_mixed["output_orig"].output_text
['Inditex, the owner of Zara and Massimo Dutti, has reported a sharp rise in half-year profits.']
Mixed phrase- and sentence-level importance scores using each similarity metric, again normalized by the maximum score:
attrib_scores_df = pd.DataFrame(output_dict_mixed["attributions"]).set_index("units")
attrib_scores_df = attrib_scores_df[["unit_types"] + score_labels]
attrib_scores_df[score_labels] /= attrib_scores_df[score_labels].max(axis=0)
attrib_scores_df
unit_types | nli_logit | bert | st | summ | bart | |
---|---|---|---|---|---|---|
units | ||||||
The world's biggest clothing retailer | nsubj | 0.761780 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
posted | ROOT | -0.019273 | 0.013167 | 0.066859 | -0.067879 | -0.029003 |
net earnings of €1.26bn (£1.1bn) | dobj | 0.753891 | 0.553393 | 0.565972 | 0.302390 | 0.312603 |
in the six months to 31 July | prep | 0.355874 | 0.643719 | 0.249408 | 0.394622 | 0.349698 |
- | n | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
up 8% on the same period last year | advmod | 0.413663 | 0.335615 | 0.193663 | 0.002911 | 0.023664 |
. | n | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
\nSales jumped from €9.4bn to €10.5bn, an increase of 11%. | s | 0.798400 | 0.397421 | 0.532677 | 0.133202 | 0.118465 |
\nThe group's clothes can now be bought online in around 40 countries, it said. | s | -0.145032 | -0.080781 | -0.008423 | 0.045120 | 0.048113 |
\nInditex operates eight brands in 90 countries including Pull&Bear, Massimo Dutti and Bershka. | s | 0.697173 | 0.721930 | 0.890838 | 0.770730 | 0.698585 |
\n | n | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
How Zara's founder became | non-leaf | 1.000000 | 0.656790 | 0.699175 | 0.463225 | 0.462987 |
the richest man in the world | attr | 0.498924 | 0.202898 | 0.242346 | 0.015518 | 0.008883 |
- | n | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
for two days | prep | -0.026446 | -0.024018 | -0.038698 | -0.026014 | -0.025854 |
\n | n | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
Chairman and chief executive Pablo Isla | nsubj | 0.559903 | 0.245624 | 0.083386 | -0.035465 | -0.035741 |
emphasised | ROOT | 0.732028 | 0.316333 | 0.097631 | -0.086952 | -0.076342 |
the firm's investment in technology | dobj | 0.571772 | 0.279794 | 0.081857 | -0.061290 | -0.055816 |
, | n | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
saying | non-leaf | -0.000191 | 0.056358 | 0.110401 | 0.077796 | 0.074199 |
the firm | nsubj | -0.024670 | 0.022419 | 0.045460 | 0.059150 | 0.060242 |
had expanded its online stores | non-leaf | 0.175321 | 0.109267 | 0.125397 | 0.050718 | 0.049953 |
to 11 new countries | prep | 0.036182 | 0.020531 | 0.010739 | 0.021521 | 0.018555 |
in the period | prep | 0.029960 | 0.028356 | 0.009137 | 0.015272 | 0.011095 |
. | n | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
\nIt also launched mobile phone payment in all its Spanish stores, with the objective of "extending the service to other countries". | s | 0.830614 | 0.437893 | 0.425579 | 0.277856 | 0.275284 |
\nThis will encompass online apps for all of its brands and a specific app for the whole group called InWallet. | s | 0.358946 | 0.159930 | 0.133401 | 0.036306 | 0.037782 |
\nMr Isla said: "Both our online and bricks-and-mortar stores are seamlessly connected, driven by platforms such as mobile payment, and other technological initiatives that we will continue to develop." | s | 0.198519 | 0.126102 | 0.094169 | 0.038909 | 0.043991 |
\nTom Gadsby, an analyst at Liberum, said the firm's "online drive" was important. | s | 0.198426 | 0.121622 | 0.117407 | 0.115574 | 0.122302 |
\n"I expect over the years they may find they don't have to open as many stores to maintain their strong growth rate as the online channel will become increasingly important," he said. | s | 0.586592 | 0.306874 | 0.277640 | 0.129380 | 0.119601 |
\n"And while Zara is available in many of the territories in which they operate [online], most of their other brands aren't readily available outside Europe online. | s | 0.731454 | 0.491775 | 0.374773 | 0.300718 | 0.266648 |
\n"So there is a big opportunity there for them to expand online into new territories." | s | -0.120767 | -0.060138 | -0.037352 | -0.055202 | -0.061474 |
\nThe company also said it had benefited from steady economic growth in Spain, where Inditex gets about a fifth of its sales. | s | 0.587978 | 0.278297 | 0.253464 | 0.189680 | 0.189629 |
\nThat country's clothing market grew at an average of 3% in the three-months to the end of July, according to the Spanish statistics agency. | s | 0.774326 | 0.406946 | 0.414373 | 0.272611 | 0.258739 |
\nAll of the group's brands increased their international presence during the period, with 83 new stores opened in 38 countries. | s | -0.933324 | -0.441782 | -0.337152 | -0.193818 | -0.188448 |
\nIn a call with analysts, it said it would open 6-8% of new store space over course of the year. | s | 0.405407 | 0.114183 | 0.045864 | -0.099814 | -0.102668 |
\n | n | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
The firm's strong performance sets it apart from European rivals H&M and Next, which have blamed unseasonal weather for below-forecast results this year. | s | 0.573354 | 0.203896 | 0.106801 | 0.053895 | 0.050738 |
4. Evaluate fidelity of attributions to explained model¶
We now evaluate the fidelity of both the sentence-level and mixed-level explanations to the behavior of the summarization model. We do this by computing perturbation curves. Given a set of attribution scores, the perturbation curve measures how much the output summary changes as we remove more and more units from the input document, in decreasing order of importance according to the scores.
Instantiate perturbation curve evaluator¶
We instantiate a PerturbCurveEvaluator
to compute perturbation curves. Similar to the explainer, PerturbCurveEvaluator
requires a scalarizer to quantify how much the output summary changes from the original summary as more input units are removed. Here we use a different scalarizer than those used in the explainer, namely the "prob"
scalarizer, which computes the probability of generating the original summary conditioned on perturbed inputs.
evaluator = PerturbCurveEvaluator(wrapped_model, scalarizer="prob")
Evaluate perturbation curves¶
We call the eval_perturb_curve
method to compute perturbation curves for both sentence-level and mixed-level attribution scores and for all scores obtained with the different similarity metrics in the explanation scalarizer (NLI score, BERTScore, etc.). Parameters for eval_perturb_curve
are as follows:
output_dict_sent
oroutput_dict_mixed
: The dictionary returned by the explainerscore_label
: The score label corresponding to each similarity metrictoken_frac=True
: This setting allows comparison between different kinds of units (sentences vs. mixed) because it takes into account the number of tokens in each unit, which is considered as the length of the unit and in ranking units.model_params
: The same model generation parameters as before
perturb_curve = {"sent": {}, "mixed": {}}
for score_label in score_labels:
perturb_curve["sent"][score_label] = evaluator.eval_perturb_curve(output_dict_sent, score_label, token_frac=True, model_params=model_params)
perturb_curve["mixed"][score_label] = evaluator.eval_perturb_curve(output_dict_mixed, score_label, token_frac=True, model_params=model_params)
toma_get_probs batch size = 11 toma_get_probs batch size = 18 toma_get_probs batch size = 10 toma_get_probs batch size = 20 toma_get_probs batch size = 10 toma_get_probs batch size = 21 toma_get_probs batch size = 9 toma_get_probs batch size = 18 toma_get_probs batch size = 10 toma_get_probs batch size = 18
Plot perturbation curves¶
The perturbation curves are plotted below as a function of the fraction of tokens removed from the input. The y-axis is the decrease in the log probability of generating the original summary, computed by the scalarizer of PerturbCurveEvaluator
.
# Sentence-level perturbation curves
line = {}
for score_label in score_labels:
df = pd.DataFrame(perturb_curve["sent"][score_label]).set_index("frac")
line[score_label], = plt.plot(df.loc[0] - df)
# Mixed-level perturbation curves
for score_label in score_labels:
df = pd.DataFrame(perturb_curve["mixed"][score_label]).set_index("frac")
plt.plot(df.loc[0] - df, color=line[score_label].get_color(), linestyle="--")
plt.xlabel("fraction of tokens perturbed")
plt.ylabel("decrease in log prob of original output")
plt.legend(score_labels)
<matplotlib.legend.Legend at 0x14abe9e1c850>
In general, we are looking for perturbation curves to increase as more tokens are removed from the input. A higher perturbation curve is better because it indicates that the units identified by the explanation as important actually do have a larger effect on the LLM's output, and hence the explanation is more faithful to the LLM. Some observations for specific LLMs (your results may vary):
DistilBART: For this model, mixed-level attribution scores (dashed curves) are generally more effective in identifying units whose removal causes larger drops in the model's log probability.
Granite-3.3-2B-Instruct: Sentence-level attribution scores (solid curves) perform about as well as mixed-level scores for this model, and in some cases are better.
General observations: There is no separation or clear ordering among the 5 similarity metrics, based on this single example. "summ"
, and "bart"
tend to be most similar to each other, and "bert"
and "st"
may be similar to each other as well.