Post-Processing#

Post-Processing methods included in the package

Base post-processing class#

class inFairness.postprocessing.BasePostProcessing(distance_x, is_output_probas)[source]#

Base class for Post-Processing methods

Parameters:
  • distance_x (inFairness.distances.Distance) – Distance matrix in the input space

  • is_output_probas (bool) – True if the data_Y (model output) are probabilities implying that this is a classification setting, and False if the data_Y are in euclidean space implying that this is a regression setting.

add_datapoints(X: Tensor, y: Tensor)[source]#

Add datapoints to the post-processing method

Parameters:
property data#

Input and Output data used for post-processing

Returns:

data – A tuple of (X, Y) data points

Return type:

Tuple(torch.Tensor, torch.Tensor)

property distance_matrix#

Distance matrix

Returns:

distance_matrix – Matrix of distances of shape (N, N) where N is the number of data samples

Return type:

torch.Tensor

reset_datapoints()[source]#

Reset datapoints store back to its initial state

Graph Laplacian Individual Fairness (GLIF)#

class inFairness.postprocessing.GraphLaplacianIF(distance_x, is_output_probas)[source]#

Implements the Graph Laplacian Individual Fairness Post-Processing method.

Proposed in Post-processing for Individual Fairness

Parameters:
  • distance_x (inFairness.distances.Distance) – Distance metric in the input space

  • is_output_probas (bool) – True if the data_Y (model output) are probabilities implying that this is a classification setting, and False if the data_Y are in euclidean space implying that this is a regression setting.

get_objective(y_solution, lambda_param: float, scale: float, threshold: float, normalize: bool = False, W_graph=None, idxs=None, L=None)[source]#

Compute the objective values for the individual fairness as follows:

\[\widehat{\mathbf{f}} = \arg \min_{\mathbf{f}} \ \|\mathbf{f} - \hat{\mathbf{y}}\|_2^2 + \lambda \ \mathbf{f}^{\top}\mathbb{L_n} \mathbf{f}\]

Refer equation 3.1 in the paper

Parameters:
  • y_solution (torch.Tensor) – Post-processed solution values of shape (N, C)

  • lambda_param (float) – Weight for the Laplacian Regularizer

  • scale (float) – Parameter used to scale the computed distances. Refer equation 2.2 in the proposing paper.

  • threshold (float) – Parameter used to construct the Graph from distances Distances below provided threshold are considered to be connected edges, while beyond the threshold are considered to be disconnected. Refer equation 2.2 in the proposing paper.

  • normalize (bool) – Whether to normalize the computed Laplacian or not

  • W_graph (torch.Tensor) – Adjacency matrix of shape (N, N)

  • idxs (torch.Tensor) – Indices of data points which are included in the adjacency matrix

  • L (torch.Tensor) – Laplacian of the adjacency matrix

Returns:

objective

post-processed solution containing two parts:
  1. Post-processed output probabilities of shape (N, C) where N is the number of data samples, and C is the number of output classes

  2. Objective values. Refer equation 3.1 in the paper for an explanation of the various parts

Return type:

PostProcessingObjectiveResponse

postprocess(method: str, lambda_param: float, scale: float, threshold: float, normalize: bool = False, batchsize: int | None = None, epochs: int | None = None)[source]#

Implements the Graph Laplacian Individual Fairness Post-processing algorithm

Parameters:
  • method (str) –

    GLIF method type. Possible values are:

    (a) coordinate-descent method which is more suitable for large-scale data and post-processes by batching data into minibatches (see section 3.2.2 of the paper), or

    (b) exact method which gets the exact solution but is not appropriate for large-scale data (refer equation 3.3 in the paper).

  • lambda_param (float) – Weight for the Laplacian Regularizer

  • scale (float) – Parameter used to scale the computed distances. Refer equation 2.2 in the proposing paper.

  • threshold (float) – Parameter used to construct the Graph from distances Distances below provided threshold are considered to be connected edges, while beyond the threshold are considered to be disconnected. Refer equation 2.2 in the proposing paper.

  • normalize (bool) – Whether to normalize the computed Laplacian or not

  • batchsize (int) – Batch size. Required when method=`coordinate-descent`

  • epochs (int) – Number of coordinate descent epochs. Required when method=`coordinate-descent`

Returns:

solution

post-processed solution containing two parts:
  1. Post-processed output probabilities of shape (N, C) where N is the number of data samples, and C is the number of output classes

  2. Objective values. Refer equation 3.1 in the paper for an explanation of the various parts

Return type:

PostProcessingObjectiveResponse