MExGen for Summarization¶

This notebook walks through an example of using MExGen (Multi-Level Explanations for Generative Language Models) to explain an LLM's summarization of a document.

After setting things up in Section 1, we will obtain explanations in the form of sentence-level attributions to the input document in Section 2, followed by mixed phrase- and sentence-level attributions in Section 3. We will then evaluate the fidelity of these explanations to the LLM in Section 4.

1. Setup¶

Import packages¶

Standard packages

In [ ]:

Copied!





from datasets import load_dataset    # for XSum dataset
import matplotlib.pyplot as plt    # for plotting perturbation curves
import numpy as np
from openai import OpenAI    # for VLLM summarization model
import pandas as pd    # only for displaying DataFrames
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BartForConditionalGeneration, BartTokenizerFast    # for HuggingFace summarization models
from datasets import load_dataset    # for XSum dataset
import matplotlib.pyplot as plt    # for plotting perturbation curves
import numpy as np
from openai import OpenAI    # for VLLM summarization model
import pandas as pd    # only for displaying DataFrames
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BartForConditionalGeneration, BartTokenizerFast    # for HuggingFace summarization models

ICX360 classes

In [2]:

Copied!





from icx360.algorithms.mexgen import CLIME, LSHAP    # explainers
from icx360.metrics import PerturbCurveEvaluator    # fidelity evaluation
from icx360.utils.general_utils import select_device    # set device automatically
from icx360.utils.model_wrappers import HFModel, VLLMModel    # model wrappers
from icx360.algorithms.mexgen import CLIME, LSHAP    # explainers
from icx360.metrics import PerturbCurveEvaluator    # fidelity evaluation
from icx360.utils.general_utils import select_device    # set device automatically
from icx360.utils.model_wrappers import HFModel, VLLMModel    # model wrappers

In [3]:

Copied!

device = select_device()
device
device = select_device()
device

Out[3]:

device(type='cuda')

Load model to explain¶

Here you can choose from the following models:

"distilbart": A small summarization model from HuggingFace
"granite-hf": A larger model from HuggingFace
"vllm": A model served using VLLM. This is a "bring your own model" option, for which you will have to supply the parameters below (model_name, base_url, api_key, and any others).

In [4]:

Copied!

model_type = "distilbart"
# model_type = "granite-hf"
# model_type = "vllm"
model_type = "distilbart"
# model_type = "granite-hf"
# model_type = "vllm"

In [5]:

Copied!





if model_type == "distilbart":
    model_name = "sshleifer/distilbart-xsum-12-6"
    model = BartForConditionalGeneration.from_pretrained(model_name).to(device)
    tokenizer = BartTokenizerFast.from_pretrained(model_name, add_prefix_space=True)

elif model_type == "granite-hf":
    model_name = "ibm-granite/granite-3.3-2b-instruct"
    model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16).to(device)
    tokenizer = AutoTokenizer.from_pretrained(model_name, add_prefix_space=True)

elif model_type == "vllm":
    # IF YOU HAVE A VLLM MODEL, UNCOMMENT AND REPLACE THE FOLLOWING LINES WITH YOUR MODEL'S PARAMETERS
    # base_url = "https://YOUR/MODEL/URL"
    # api_key = YOUR_API_KEY
    # openai_kwargs = {}
    model = OpenAI(api_key=api_key, base_url=base_url, **openai_kwargs)
    # Corresponding HuggingFace tokenizer for applying chat template
    # model_name = "YOUR/MODEL-NAME"
    # tokenizer_kwargs = {}
    tokenizer = AutoTokenizer.from_pretrained(model_name, **tokenizer_kwargs)

else:
    raise ValueError("Unknown model type")
if model_type == "distilbart":
    model_name = "sshleifer/distilbart-xsum-12-6"
    model = BartForConditionalGeneration.from_pretrained(model_name).to(device)
    tokenizer = BartTokenizerFast.from_pretrained(model_name, add_prefix_space=True)

elif model_type == "granite-hf":
    model_name = "ibm-granite/granite-3.3-2b-instruct"
    model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16).to(device)
    tokenizer = AutoTokenizer.from_pretrained(model_name, add_prefix_space=True)

elif model_type == "vllm":
    # IF YOU HAVE A VLLM MODEL, UNCOMMENT AND REPLACE THE FOLLOWING LINES WITH YOUR MODEL'S PARAMETERS
    # base_url = "https://YOUR/MODEL/URL"
    # api_key = YOUR_API_KEY
    # openai_kwargs = {}
    model = OpenAI(api_key=api_key, base_url=base_url, **openai_kwargs)
    # Corresponding HuggingFace tokenizer for applying chat template
    # model_name = "YOUR/MODEL-NAME"
    # tokenizer_kwargs = {}
    tokenizer = AutoTokenizer.from_pretrained(model_name, **tokenizer_kwargs)

else:
    raise ValueError("Unknown model type")

We then wrap the model with a common API (HFModel or VLLMModel) that the explainer will use.

In [6]:

Copied!





if model_type in ("distilbart", "granite-hf"):
    wrapped_model = HFModel(model, tokenizer)
elif model_type == "vllm":
    wrapped_model = VLLMModel(model, model_name, tokenizer)
if model_type in ("distilbart", "granite-hf"):
    wrapped_model = HFModel(model, tokenizer)
elif model_type == "vllm":
    wrapped_model = VLLMModel(model, model_name, tokenizer)

Load input¶

Load the Extreme Summarization (XSum) dataset

In [7]:

Copied!

#dataset = load_dataset('xsum', split='train', trust_remote_code=True)
dataset = load_dataset('xsum', split='test', trust_remote_code=True)
#dataset = load_dataset('xsum', split='train', trust_remote_code=True)
dataset = load_dataset('xsum', split='test', trust_remote_code=True)

For this example, we will find a news article about the clothing retailer Inditex. This can be modified to load a different article.

In [8]:

Copied!





for document in dataset["document"]:
    if "The world's biggest clothing retailer" in document:
        break
print(document)
for document in dataset["document"]:
    if "The world's biggest clothing retailer" in document:
        break
print(document)

The world's biggest clothing retailer posted net earnings of €1.26bn (£1.1bn) in the six months to 31 July - up 8% on the same period last year.
Sales jumped from €9.4bn to €10.5bn, an increase of 11%.
The group's clothes can now be bought online in around 40 countries, it said.
Inditex operates eight brands in 90 countries including Pull&Bear, Massimo Dutti and Bershka.
How Zara's founder became the richest man in the world - for two days
Chairman and chief executive Pablo Isla emphasised the firm's investment in technology, saying the firm had expanded its online stores to 11 new countries in the period.
It also launched mobile phone payment in all its Spanish stores, with the objective of "extending the service to other countries".
This will encompass online apps for all of its brands and a specific app for the whole group called InWallet.
Mr Isla said: "Both our online and bricks-and-mortar stores are seamlessly connected, driven by platforms such as mobile payment, and other technological initiatives that we will continue to develop."
Tom Gadsby, an analyst at Liberum, said the firm's "online drive" was important.
"I expect over the years they may find they don't have to open as many stores to maintain their strong growth rate as the online channel will become increasingly important," he said.
"And while Zara is available in many of the territories in which they operate [online], most of their other brands aren't readily available outside Europe online.
"So there is a big opportunity there for them to expand online into new territories."
The company also said it had benefited from steady economic growth in Spain, where Inditex gets about a fifth of its sales.
That country's clothing market grew at an average of 3% in the three-months to the end of July, according to the Spanish statistics agency.
All of the group's brands increased their international presence during the period, with 83 new stores opened in 38 countries.
In a call with analysts, it said it would open 6-8% of new store space over course of the year.
The firm's strong performance sets it apart from European rivals H&M and Next, which have blamed unseasonal weather for below-forecast results this year.

Generate model response¶

As a check on our setup, we will have the model generate its summary of the input document, via the wrapped_model object created above.

First we specify parameters for model generation, as a dictionary model_params. These parameters include max_new_tokens/max_tokens, whether to use the model's chat template, and any instruction provided as a system prompt (the DistilBART model does not need an instruction to summarize).

In [9]:

Copied!





model_params = {}
if model_type == "vllm":
    model_params["max_tokens"] = 100
    model_params["seed"] = 20250430
else:
    model_params["max_new_tokens"] = 100
    
if model_type in ("granite-hf", "vllm"):
    model_params["chat_template"] = True
    model_params["system_prompt"] = "Summarize the following article in one sentence. Do not preface the summary with anything."

model_params
model_params = {}
if model_type == "vllm":
    model_params["max_tokens"] = 100
    model_params["seed"] = 20250430
else:
    model_params["max_new_tokens"] = 100
    
if model_type in ("granite-hf", "vllm"):
    model_params["chat_template"] = True
    model_params["system_prompt"] = "Summarize the following article in one sentence. Do not preface the summary with anything."

model_params

Out[9]:

{'max_new_tokens': 100}

Now we generate the summary:

In [10]:

Copied!

output_orig = wrapped_model.generate(document, **model_params)
print(output_orig)
output_orig = wrapped_model.generate(document, **model_params)
print(output_orig)

['Inditex, the owner of Zara and Massimo Dutti, has reported a sharp rise in half-year profits as it continues to expand its online presence.']

2. Sentence-Level Explanation¶

Instantiate explainer¶

Here you can choose between two attribution algorithms used by MExGen, C-LIME and L-SHAP. These are more efficient variants of LIME and SHAP respectively. In either case, the explanation takes the form of importance scores assigned to parts of the input document, and these scores are computed by calling the summarization model on perturbed versions of the input.

In [11]:

Copied!





# explainer_alg = "clime"
explainer_alg = "lshap"

if explainer_alg == "clime":
    explainer_class = CLIME
elif explainer_alg == "lshap":
    explainer_class = LSHAP
# explainer_alg = "clime"
explainer_alg = "lshap"

if explainer_alg == "clime":
    explainer_class = CLIME
elif explainer_alg == "lshap":
    explainer_class = LSHAP

The primary parameter for the explainer is the "scalarizer", which quantifies how different are the output summaries for perturbed inputs from the output summary for the original input. For this we will use "text-only" scalarizers (scalarizer="text"), which compute different similarity scores between the original summary and the perturbed summaries, thus providing different views of what constitutes "similarity". Small language models are used to provide these similarity scores. Specifically, we use an NLI model to compute both NLI scores and BERTScores, and a summarization model (same as the DistilBART model above) to compute "SUMM" scores and BARTScores.

In [12]:

Copied!





model_nli_name = "microsoft/deberta-v2-xxlarge-mnli"
model_summ_name = "sshleifer/distilbart-xsum-12-6"

explainer = explainer_class(wrapped_model, scalarizer="text", 
                            model_nli=model_nli_name, model_bert=model_nli_name,
                            model_summ=model_summ_name, model_bart=model_summ_name, device=device)
model_nli_name = "microsoft/deberta-v2-xxlarge-mnli"
model_summ_name = "sshleifer/distilbart-xsum-12-6"

explainer = explainer_class(wrapped_model, scalarizer="text", 
                            model_nli=model_nli_name, model_bert=model_nli_name,
                            model_summ=model_summ_name, model_bart=model_summ_name, device=device)

Call explainer¶

We call the explainer's explain_instance method on the input document, with the model generation parameters in model_params and default settings otherwise. This will segment the document into sentences and attribute an importance score to each sentence.

In [13]:

Copied!

output_dict_sent = explainer.explain_instance(document, model_params=model_params)
output_dict_sent = explainer.explain_instance(document, model_params=model_params)

toma_generate batch size = 132

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.

['Inditex, the owner of Zara and Massimo Dutti, has reported a sharp rise in half-year profits, helped by a surge in sales.']
['Inditex, the owner of Zara and Massimo Dutti, has reported a sharp rise in half-year profits, helped by a surge in sales.', 'Inditex, the owner of Zara and Massimo Dutti, has reported a sharp rise in half-year profits.', 'Inditex, the owner of Zara and Massimo Dutti, has reported a sharp rise in half-year profits.', 'Inditex, the owner of chains including Zara, Massimo Dutti and Pull&Bear, has reported a sharp rise in half-year profits.', 'Inditex, the owner of chains including Zara and Pull&Bear, has reported a sharp rise in half-year profits as it continues to expand its online presence.']
NLI scalarizer ref->gen
toma_call batch size = 132
NLI scalarizer gen->ref
toma_call batch size = 132
summ scalarizer

Passing a tuple of `past_key_values` is deprecated and will be removed in Transformers v4.58.0. You should pass an instance of `EncoderDecoderCache` instead, e.g. `past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)`.

toma_get_probs batch size = 132

Look at output¶

The explainer returns a dictionary. The "output_orig" item shows the output summary for the original document.

In [14]:

Copied!

output_dict_sent["output_orig"].output_text
output_dict_sent["output_orig"].output_text

Out[14]:

['Inditex, the owner of Zara and Massimo Dutti, has reported a sharp rise in half-year profits, helped by a surge in sales.']

The "attributions" item is itself a dictionary, containing the sentences ("units") that the document has been split into, the corresponding "unit_types", and the importance scores for the sentences, one score for each similarity metric included in the scalarizer (NLI, BERTScore, etc.). These are displayed below as a pandas DataFrame, where we also normalize each column of scores by the maximum score.

In [15]:

Copied!





attrib_scores_df = pd.DataFrame(output_dict_sent["attributions"]).set_index("units")

score_labels = explainer.scalarized_model.sim_scores
attrib_scores_df = attrib_scores_df[["unit_types"] + score_labels]
attrib_scores_df[score_labels] /= attrib_scores_df[score_labels].max(axis=0)
attrib_scores_df
attrib_scores_df = pd.DataFrame(output_dict_sent["attributions"]).set_index("units")

score_labels = explainer.scalarized_model.sim_scores
attrib_scores_df = attrib_scores_df[["unit_types"] + score_labels]
attrib_scores_df[score_labels] /= attrib_scores_df[score_labels].max(axis=0)
attrib_scores_df

Out[15]:

	unit_types	nli_logit	bert	st	summ	bart
units
The world's biggest clothing retailer posted net earnings of €1.26bn (£1.1bn) in the six months to 31 July - up 8% on the same period last year.	s	0.683280	1.000000	0.643647	1.000000	1.000000
\nSales jumped from €9.4bn to €10.5bn, an increase of 11%.	s	0.729922	0.395132	0.669389	0.225816	0.260140
\nThe group's clothes can now be bought online in around 40 countries, it said.	s	0.390864	0.042293	-0.167252	0.047666	0.048265
\nInditex operates eight brands in 90 countries including Pull&Bear, Massimo Dutti and Bershka.	s	0.600581	0.297317	0.265977	0.457829	0.377207
\nHow Zara's founder became the richest man in the world - for two days\nChairman and chief executive Pablo Isla emphasised the firm's investment in technology, saying the firm had expanded its online stores to 11 new countries in the period.	s	0.707665	0.703973	1.000000	0.472878	0.584450
\nIt also launched mobile phone payment in all its Spanish stores, with the objective of "extending the service to other countries".	s	0.645685	0.268591	0.255037	0.186310	0.205321
\nThis will encompass online apps for all of its brands and a specific app for the whole group called InWallet.	s	0.548298	0.159034	0.105412	0.048554	0.039156
\nMr Isla said: "Both our online and bricks-and-mortar stores are seamlessly connected, driven by platforms such as mobile payment, and other technological initiatives that we will continue to develop."	s	0.556377	0.211338	0.150247	0.104636	0.101121
\nTom Gadsby, an analyst at Liberum, said the firm's "online drive" was important.	s	0.676632	0.300964	0.224815	0.336209	0.342782
\n"I expect over the years they may find they don't have to open as many stores to maintain their strong growth rate as the online channel will become increasingly important," he said.	s	1.000000	0.329794	0.332405	0.303154	0.287829
\n"And while Zara is available in many of the territories in which they operate [online], most of their other brands aren't readily available outside Europe online.	s	0.218182	0.039034	0.007694	-0.009965	-0.013622
\n"So there is a big opportunity there for them to expand online into new territories."	s	0.188577	0.044537	0.007384	-0.024480	-0.018063
\nThe company also said it had benefited from steady economic growth in Spain, where Inditex gets about a fifth of its sales.	s	0.760705	0.187366	-0.026861	0.152611	0.146245
\nThat country's clothing market grew at an average of 3% in the three-months to the end of July, according to the Spanish statistics agency.	s	0.923588	0.446365	0.601362	0.508205	0.462474
\nAll of the group's brands increased their international presence during the period, with 83 new stores opened in 38 countries.	s	0.488930	0.038490	-0.192519	0.031038	0.039291
\nIn a call with analysts, it said it would open 6-8% of new store space over course of the year.	s	0.890768	0.493915	0.632328	0.366810	0.398132
\n	n	0.000000	0.000000	0.000000	0.000000	0.000000
The firm's strong performance sets it apart from European rivals H&M and Next, which have blamed unseasonal weather for below-forecast results this year.	s	0.361864	-0.083896	-0.156779	-0.048950	-0.049222

While the importance scores should align roughly with our human intuition (for example, sentences mentioning increases in earnings, sales, and online presence are important), we will defer to Section 4 the evaluation of how faithful they are to the summarization LLM.

3. Mixed Phrase- and Sentence-Level Explanation¶

We will now consider the multi-level aspect of MExGen by obtaining mixed phrase- and sentence-level attributions to the input document.

Set up parameters¶

For this illustration, we will segment the 2 most important sentences (as determined in the previous section) into phrases. (This number can be changed.) We will also measure importance by the sum of scores across the similarity metrics (a single similarity metric could be used too).

In [16]:

Copied!

num_top_sent = 2
score_label = "sum"
num_top_sent = 2
score_label = "sum"

The parameters for explain_instance() will be as follows:

units and unit_types: Take existing sentence-level units and unit types from output_dict_sent["attributions"]
ind_segment: We create a Boolean array that has value True in positions corresponding to the top 2 sentences in terms of the sum of scores, and False otherwise. This will tell the explainer to segment only these 2 sentences.
segment_type = "ph" for segmentation into phrases
model_params as before

In [17]:

Copied!





units = output_dict_sent["attributions"]["units"]
unit_types = output_dict_sent["attributions"]["unit_types"]
segment_type = "ph"

if score_label == "sum":
    scores = attrib_scores_df[score_labels].sum(axis=1).values
else:
    scores = attrib_scores_df[score_label].values

ind_segment = np.zeros_like(scores, dtype=bool)
ind_segment[np.argsort(scores)[-num_top_sent:]] = True
ind_segment
units = output_dict_sent["attributions"]["units"]
unit_types = output_dict_sent["attributions"]["unit_types"]
segment_type = "ph"

if score_label == "sum":
    scores = attrib_scores_df[score_labels].sum(axis=1).values
else:
    scores = attrib_scores_df[score_label].values

ind_segment = np.zeros_like(scores, dtype=bool)
ind_segment[np.argsort(scores)[-num_top_sent:]] = True
ind_segment

Out[17]:

array([ True, False, False, False,  True, False, False, False, False,
       False, False, False, False, False, False, False, False, False])

Call explainer¶

Now we call explain_instance() with the above parameters

In [18]:

Copied!

output_dict_mixed = explainer.explain_instance(units, unit_types, ind_segment=ind_segment, segment_type=segment_type, model_params=model_params)
output_dict_mixed = explainer.explain_instance(units, unit_types, ind_segment=ind_segment, segment_type=segment_type, model_params=model_params)

became advcl How Zara's founder became
expanded ccomp had expanded its online stores
toma_generate batch size = 258
['Inditex, the owner of Zara and Massimo Dutti, has reported a sharp rise in half-year profits.']
['Inditex, the owner of Zara and Massimo Dutti, has reported a sharp rise in half-year profits.', 'Inditex, the owner of chains including Zara, Pull&Bear and Massimo Dutti, has reported a sharp rise in profits.', 'Inditex, the owner of chains including Zara, Massimo Dutti and Pull&Bear, has reported a sharp rise in profits.', 'Inditex, the owner of chains including Zara, Pull&Bear and Massimo Dutti, has reported a sharp rise in half-year profits.', 'Inditex, the owner of chains including Zara, Massimo Dutti and Pull&Bear, has reported a sharp rise in profits.']
NLI scalarizer ref->gen
toma_call batch size = 258
NLI scalarizer gen->ref
toma_call batch size = 258
summ scalarizer
toma_get_probs batch size = 258

Look at output¶

Output summary for the original document

In [19]:

Copied!

output_dict_mixed["output_orig"].output_text
output_dict_mixed["output_orig"].output_text

Out[19]:

['Inditex, the owner of Zara and Massimo Dutti, has reported a sharp rise in half-year profits.']

Mixed phrase- and sentence-level importance scores using each similarity metric, again normalized by the maximum score:

In [20]:

Copied!





attrib_scores_df = pd.DataFrame(output_dict_mixed["attributions"]).set_index("units")
attrib_scores_df = attrib_scores_df[["unit_types"] + score_labels]
attrib_scores_df[score_labels] /= attrib_scores_df[score_labels].max(axis=0)
attrib_scores_df
attrib_scores_df = pd.DataFrame(output_dict_mixed["attributions"]).set_index("units")
attrib_scores_df = attrib_scores_df[["unit_types"] + score_labels]
attrib_scores_df[score_labels] /= attrib_scores_df[score_labels].max(axis=0)
attrib_scores_df

Out[20]:

	unit_types	nli_logit	bert	st	summ	bart
units
The world's biggest clothing retailer	nsubj	0.761780	1.000000	1.000000	1.000000	1.000000
posted	ROOT	-0.019273	0.013167	0.066859	-0.067879	-0.029003
net earnings of €1.26bn (£1.1bn)	dobj	0.753891	0.553393	0.565972	0.302390	0.312603
in the six months to 31 July	prep	0.355874	0.643719	0.249408	0.394622	0.349698
-	n	0.000000	0.000000	0.000000	0.000000	0.000000
up 8% on the same period last year	advmod	0.413663	0.335615	0.193663	0.002911	0.023664
.	n	0.000000	0.000000	0.000000	0.000000	0.000000
\nSales jumped from €9.4bn to €10.5bn, an increase of 11%.	s	0.798400	0.397421	0.532677	0.133202	0.118465
\nThe group's clothes can now be bought online in around 40 countries, it said.	s	-0.145032	-0.080781	-0.008423	0.045120	0.048113
\nInditex operates eight brands in 90 countries including Pull&Bear, Massimo Dutti and Bershka.	s	0.697173	0.721930	0.890838	0.770730	0.698585
\n	n	0.000000	0.000000	0.000000	0.000000	0.000000
How Zara's founder became	non-leaf	1.000000	0.656790	0.699175	0.463225	0.462987
the richest man in the world	attr	0.498924	0.202898	0.242346	0.015518	0.008883
-	n	0.000000	0.000000	0.000000	0.000000	0.000000
for two days	prep	-0.026446	-0.024018	-0.038698	-0.026014	-0.025854
\n	n	0.000000	0.000000	0.000000	0.000000	0.000000
Chairman and chief executive Pablo Isla	nsubj	0.559903	0.245624	0.083386	-0.035465	-0.035741
emphasised	ROOT	0.732028	0.316333	0.097631	-0.086952	-0.076342
the firm's investment in technology	dobj	0.571772	0.279794	0.081857	-0.061290	-0.055816
,	n	0.000000	0.000000	0.000000	0.000000	0.000000
saying	non-leaf	-0.000191	0.056358	0.110401	0.077796	0.074199
the firm	nsubj	-0.024670	0.022419	0.045460	0.059150	0.060242
had expanded its online stores	non-leaf	0.175321	0.109267	0.125397	0.050718	0.049953
to 11 new countries	prep	0.036182	0.020531	0.010739	0.021521	0.018555
in the period	prep	0.029960	0.028356	0.009137	0.015272	0.011095
.	n	0.000000	0.000000	0.000000	0.000000	0.000000
\nIt also launched mobile phone payment in all its Spanish stores, with the objective of "extending the service to other countries".	s	0.830614	0.437893	0.425579	0.277856	0.275284
\nThis will encompass online apps for all of its brands and a specific app for the whole group called InWallet.	s	0.358946	0.159930	0.133401	0.036306	0.037782
\nMr Isla said: "Both our online and bricks-and-mortar stores are seamlessly connected, driven by platforms such as mobile payment, and other technological initiatives that we will continue to develop."	s	0.198519	0.126102	0.094169	0.038909	0.043991
\nTom Gadsby, an analyst at Liberum, said the firm's "online drive" was important.	s	0.198426	0.121622	0.117407	0.115574	0.122302
\n"I expect over the years they may find they don't have to open as many stores to maintain their strong growth rate as the online channel will become increasingly important," he said.	s	0.586592	0.306874	0.277640	0.129380	0.119601
\n"And while Zara is available in many of the territories in which they operate [online], most of their other brands aren't readily available outside Europe online.	s	0.731454	0.491775	0.374773	0.300718	0.266648
\n"So there is a big opportunity there for them to expand online into new territories."	s	-0.120767	-0.060138	-0.037352	-0.055202	-0.061474
\nThe company also said it had benefited from steady economic growth in Spain, where Inditex gets about a fifth of its sales.	s	0.587978	0.278297	0.253464	0.189680	0.189629
\nThat country's clothing market grew at an average of 3% in the three-months to the end of July, according to the Spanish statistics agency.	s	0.774326	0.406946	0.414373	0.272611	0.258739
\nAll of the group's brands increased their international presence during the period, with 83 new stores opened in 38 countries.	s	-0.933324	-0.441782	-0.337152	-0.193818	-0.188448
\nIn a call with analysts, it said it would open 6-8% of new store space over course of the year.	s	0.405407	0.114183	0.045864	-0.099814	-0.102668
\n	n	0.000000	0.000000	0.000000	0.000000	0.000000
The firm's strong performance sets it apart from European rivals H&M and Next, which have blamed unseasonal weather for below-forecast results this year.	s	0.573354	0.203896	0.106801	0.053895	0.050738

4. Evaluate fidelity of attributions to explained model¶

We now evaluate the fidelity of both the sentence-level and mixed-level explanations to the behavior of the summarization model. We do this by computing perturbation curves. Given a set of attribution scores, the perturbation curve measures how much the output summary changes as we remove more and more units from the input document, in decreasing order of importance according to the scores.

Instantiate perturbation curve evaluator¶

We instantiate a PerturbCurveEvaluator to compute perturbation curves. Similar to the explainer, PerturbCurveEvaluator requires a scalarizer to quantify how much the output summary changes from the original summary as more input units are removed. Here we use a different scalarizer than those used in the explainer, namely the "prob" scalarizer, which computes the probability of generating the original summary conditioned on perturbed inputs.

In [21]:

Copied!

evaluator = PerturbCurveEvaluator(wrapped_model, scalarizer="prob")
evaluator = PerturbCurveEvaluator(wrapped_model, scalarizer="prob")

Evaluate perturbation curves¶

We call the eval_perturb_curve method to compute perturbation curves for both sentence-level and mixed-level attribution scores and for all scores obtained with the different similarity metrics in the explanation scalarizer (NLI score, BERTScore, etc.). Parameters for eval_perturb_curve are as follows:

output_dict_sent or output_dict_mixed: The dictionary returned by the explainer
score_label: The score label corresponding to each similarity metric
token_frac=True: This setting allows comparison between different kinds of units (sentences vs. mixed) because it takes into account the number of tokens in each unit, which is considered as the length of the unit and in ranking units.
model_params: The same model generation parameters as before

In [22]:

Copied!

perturb_curve = {"sent": {}, "mixed": {}}

for score_label in score_labels:
    perturb_curve["sent"][score_label] = evaluator.eval_perturb_curve(output_dict_sent, score_label, token_frac=True, model_params=model_params)
    perturb_curve["mixed"][score_label] = evaluator.eval_perturb_curve(output_dict_mixed, score_label, token_frac=True, model_params=model_params)
perturb_curve = {"sent": {}, "mixed": {}}

for score_label in score_labels:
    perturb_curve["sent"][score_label] = evaluator.eval_perturb_curve(output_dict_sent, score_label, token_frac=True, model_params=model_params)
    perturb_curve["mixed"][score_label] = evaluator.eval_perturb_curve(output_dict_mixed, score_label, token_frac=True, model_params=model_params)

toma_get_probs batch size = 11
toma_get_probs batch size = 18
toma_get_probs batch size = 10
toma_get_probs batch size = 20
toma_get_probs batch size = 10
toma_get_probs batch size = 21
toma_get_probs batch size = 9
toma_get_probs batch size = 18
toma_get_probs batch size = 10
toma_get_probs batch size = 18

Plot perturbation curves¶

The perturbation curves are plotted below as a function of the fraction of tokens removed from the input. The y-axis is the decrease in the log probability of generating the original summary, computed by the scalarizer of PerturbCurveEvaluator.

In [23]:

Copied!





# Sentence-level perturbation curves
line = {}
for score_label in score_labels:
    df = pd.DataFrame(perturb_curve["sent"][score_label]).set_index("frac")
    line[score_label], = plt.plot(df.loc[0] - df)

# Mixed-level perturbation curves
for score_label in score_labels:
    df = pd.DataFrame(perturb_curve["mixed"][score_label]).set_index("frac")
    plt.plot(df.loc[0] - df, color=line[score_label].get_color(), linestyle="--")

plt.xlabel("fraction of tokens perturbed")
plt.ylabel("decrease in log prob of original output")
plt.legend(score_labels)
# Sentence-level perturbation curves
line = {}
for score_label in score_labels:
    df = pd.DataFrame(perturb_curve["sent"][score_label]).set_index("frac")
    line[score_label], = plt.plot(df.loc[0] - df)

# Mixed-level perturbation curves
for score_label in score_labels:
    df = pd.DataFrame(perturb_curve["mixed"][score_label]).set_index("frac")
    plt.plot(df.loc[0] - df, color=line[score_label].get_color(), linestyle="--")

plt.xlabel("fraction of tokens perturbed")
plt.ylabel("decrease in log prob of original output")
plt.legend(score_labels)

Out[23]:

<matplotlib.legend.Legend at 0x14abe9e1c850>

No description has been provided for this image

In general, we are looking for perturbation curves to increase as more tokens are removed from the input. A higher perturbation curve is better because it indicates that the units identified by the explanation as important actually do have a larger effect on the LLM's output, and hence the explanation is more faithful to the LLM. Some observations for specific LLMs (your results may vary):

DistilBART: For this model, mixed-level attribution scores (dashed curves) are generally more effective in identifying units whose removal causes larger drops in the model's log probability.

Granite-3.3-2B-Instruct: Sentence-level attribution scores (solid curves) perform about as well as mixed-level scores for this model, and in some cases are better.

General observations: There is no separation or clear ordering among the 5 similarity metrics, based on this single example. "summ", and "bart" tend to be most similar to each other, and "bert" and "st" may be similar to each other as well.