Distances#

Mahalanobis distance#

class inFairness.distances.MahalanobisDistances[source]#

Base class implementing the Generalized Mahalanobis Distances

Mahalanobis distance between two points X1 and X2 is computed as:

dist (X_{1}, X_{2}) = (X_{1} - X_{2}) Σ (X_{1} - X_{2})^{T}

fit(sigma)[source]#

Fit Mahalanobis Distance metric

Parameters:: sigma (torch.Tensor) – Covariance matrix

forward(X1, X2, itemwise_dist=True)[source]#

Computes the distance between data samples X1 and X2

Parameters:

X1 (torch.Tensor) – Data samples from batch 1 of shape (n_samples_1, n_features)
X2 (torch.Tensor) – Data samples from batch 2 of shape (n_samples_2, n_features)
itemwise_dist (bool, default: True) –
Compute the distance in an itemwise manner or pairwise manner.

In the itemwise fashion (itemwise_dist=False), distance is computed between the ith data sample in X1 to the ith data sample in X2. Thus, the two data samples X1 and X2 should be of the same shape

In the pairwise fashion (itemwise_dist=False), distance is computed between all the samples in X1 and all the samples in X2. In this case, the two data samples X1 and X2 can be of different shapes.

Returns:

dist – Distance between samples of batch 1 and batch 2.

If itemwise_dist=True, item-wise distance is returned of shape (n_samples, 1)

If itemwise_dist=False, pair-wise distance is returned of shape (n_samples_1, n_samples_2)

Return type:

torch.Tensor

to(device)[source]#

Moves distance metric to a particular device

Parameters:: device (torch.device) –

Sensitive Subspace distance#

class inFairness.distances.SensitiveSubspaceDistance[source]#

Implements Sensitive Subspace metric base class that accepts the basis vectors of a sensitive subspace, and computes a projection that ignores the sensitive subspace.

The projection from the sensitive subspace basis vectors (A) is computed as:

P^{^{'}} = I - (A (A A^{T})^{- 1} A^{T})

compute_projection_complement(basis_vectors)[source]#

Compute the projection complement of the space defined by the basis_vectors:

projection complement given basis vectors (A) is computed as:

P^{^{'}} = I - (A (A A^{T})^{- 1} A^{T})

Parameters:: basis_vectors (torch.Tensor) – Basis vectors of the sensitive subspace Dimension (d, k) where d is the data features dimension and k is the number of protected dimensions
Returns:: projection complement – Projection complement computed as described above. Shape (d, d) where d is the data feature dimension
Return type:: torch.Tensor

fit(basis_vectors)[source]#

Fit Sensitive Subspace Distance metric

Parameters:: basis_vectors (torch.Tensor) – Basis vectors of the sensitive subspace

SVD Sensitive Subspace#

class inFairness.distances.SVDSensitiveSubspaceDistance[source]#

Sensitive Subspace metric that uses SVD to find the basis vectors of the sensitive subspace. The metric learns a subspace from a set of user-curated comparable data samples.

Proposed in Section B.2 of Training individually fair ML models with sensitive subspace robustness

References

Yurochkin, Mikhail, Amanda Bower, and Yuekai Sun. “Training individually fair ML models with sensitive subspace robustness.” arXiv preprint arXiv:1907.00020 (2019).

compute_basis_vectors(X_train, n_components)[source]#: Compute basis vectors using SVD

fit(X_train, n_components, autoinfer_device=True)[source]#

Fit SVD Sensitive Subspace distance metric parameters

Parameters:

X_train (torch.Tensor | List[torch.Tensor]) – Training data containing comparable data samples. If only one set of comparable data samples is provided, the input should be a torch.Tensor of shape $(N, D)$ . For multiple sets of comparable data samples a list of shape $[(N_{1}, D), \dots, (N_{x}, D)]$ can be provided.
n_components (int) – Desired number of latent variable dimensions
autoinfer_device (bool) – Should the distance metric be automatically moved to an appropriate device (CPU / GPU) or not? If set to True, it moves the metric to the same device X_train is on. If set to False, keeps the metric on CPU.

EXPLORE: Embedded Xenial Pairs Logistic Regression#

class inFairness.distances.EXPLOREDistance[source]#

Implements the Embedded Xenial Pairs Logistic Regression metric (EXPLORE) defined in Section 2.2 of Two Simple Ways to Learn Individual Fairness Metrics from Data.

EXPLORE defines the distance in the input space to be of the form:

d_{x} (x_{1}, x_{2}) := ⟨ ϕ (x_{1}) - ϕ (x_{2}), Σ (ϕ (x_{1}) - ϕ (x_{2})) ⟩

where $ϕ (x)$ is an embedding map and $Σ$ is a semi-positive definite matrix.

The metric expects the data to be in the form of triplets ${(x_{i_{1}}, x_{i_{2}}, y_{i})}_{i = 1}^{n}$ where $y_{i} \in {0, 1}$ indicates whether the human considers $x_{i_{1}}$ and $x_{i_{2}}$ comparable ( $y_{i} = 1$ indicates comparable) or not.

References

Mukherjee, Debarghya, Mikhail Yurochkin, Moulinath Banerjee, and Yuekai Sun. “Two simple ways to learn individual fairness metrics from data.” In International Conference on Machine Learning, pp. 7097-7107. PMLR, 2020.

fit(X1, X2, Y, iters, batchsize, autoinfer_device=True)[source]#

Fit EXPLORE distance metric

Parameters:

X1 (torch.Tensor) – first set of input samples
X2 (torch.Tensor) – second set of input samples
Y (torch.Tensor) – $y_{i}$ vector containing 1 if corresponding elements from X1 and X2 are comparable, and 0 if not
iters (int) – number of iterations of SGD to compute the $Σ$ matrix
batchsize (int) – batch size of each iteration
autoinfer_device (bool) – Should the distance metric be automatically moved to an appropriate device (CPU / GPU) or not? If set to True, it moves the metric to the same device X1 is on. If set to False, keeps the metric on CPU.

Logistic Regression Sensitive Subspace distance metric#

class inFairness.distances.LogisticRegSensitiveSubspace[source]#

Implements the Softmax Regression model based fair metric as defined in Appendix B.1 of “Training individually fair ML models with sensitive subspace robustness” paper.

This metric assumes that the sensitive attributes are discrete and observed for a small subset of training data. Assuming data of the form $(X_{i}, K_{i}, Y_{i})$ where $K_{i}$ is the sensitive attribute of the i-th subject, the model fits a softmax regression model to the data as:

P (K_{i} = l ∣ X_{i}) = \frac{\exp (a_{l}^{T} X_{i} + b_{l})}{\sum_{l = 1}^{k} \exp (a_{l}^{T} X_{i} + b_{l})}, l = 1, \dots, k

Using the span of the matrix $A = [a_{1}, \dots, a_{k}]$ , the fair metric is trained as:

d_{x} (x_{1}, x_{2})^{2} = (x_{1} - x_{2})^{T} (I - P_{ran (A)}) (x_{1} - x_{2})

References

Yurochkin, Mikhail, Amanda Bower, and Yuekai Sun. “Training individually fair ML models with sensitive subspace robustness.” arXiv preprint arXiv:1907.00020 (2019).

fit(data_X: Tensor, data_SensitiveAttrs: Tensor | None = None, protected_idxs: Iterable[int] | None = None, keep_protected_idxs: bool = True, autoinfer_device: bool = True)[source]#

Fit Logistic Regression Sensitive Subspace distance metric

Parameters:

data_X (torch.Tensor) – Input data corresponding to either $X_{i}$ or $(X_{i}, K_{i})$ in the equation above. If the variable corresponds to $X_{i}$ , then the y_train parameter should be specified. If the variable corresponds to $(X_{i}, K_{i})$ then the protected_idxs parameter should be specified to indicate the sensitive attributes.
data_SensitiveAttrs (torch.Tensor) – Represents the sensitive attributes ( $K_{i}$ ) and is used when the X_train parameter represents $X_{i}$ from the equation above. Note: This parameter is mutually exclusive with the protected_idxs parameter. Specififying both the data_SensitiveAttrs and protected_idxs parameters will raise an error
protected_idxs (Iterable[int]) – If the X_train parameter above represents $(X_{i}, K_{i})$ , then this parameter is used to provide the indices of sensitive attributes in X_train. Note: This parameter is mutually exclusive with the protected_idxs parameter. Specififying both the data_SensitiveAttrs and protected_idxs parameters will raise an error
keep_protected_indices (bool) – True, if while training the model, protected attributes will be part of the training data Set to False, if for training the model, protected attributes will be excluded Default = True
autoinfer_device (bool) – Should the distance metric be automatically moved to an appropriate device (CPU / GPU) or not? If set to True, it moves the metric to the same device X_train is on. If set to False, keeps the metric on CPU.

property logistic_regression_models#: Logistic Regression models trained by the metric to predict each sensitive attribute given inputs. The property is a list of logistic regression models each corresponding to $P (K_{i} = l ∣ X_{i})$ . This property can be used to measure the performance of the logistic regression models.

Protected Euclidean Distance#

class inFairness.distances.ProtectedEuclideanDistance[source]#

fit(protected_attributes, num_attributes)[source]#

Fit Protected Euclidean Distance metric

Parameters:

protected_attributes (Iterable[int]) – List of attribute indices considered to be protected. The metric would ignore these protected attributes while computing distance between data points.
num_attributes (int) – Total number of attributes in the data points.

forward(x, y, itemwise_dist=True)[source]#

Parameters:: y (x,) – a B x D matrices
Returns:: B x 1 matrix with the protected distance camputed between x and y

to(device)[source]#

Moves distance metric to a particular device

Parameters:: device (torch.device) –

Euclidean Distance#

class inFairness.distances.EuclideanDistance[source]#

forward(x, y, itemwise_dist=True)[source]#

Subclasses must override this method to compute particular distances

Returns:: distance between two inputs
Return type:: Tensor

Wasserstein Distance#

class inFairness.distances.WassersteinDistance[source]#

computes a batched Wasserstein Distance for pairs of sets of items on each batch in the tensors with dimensions B, N, D and B, M, D where B and D are the batch and feature sizes and N and M are the number of items on each batch.

Currently only supporting distances inheriting from :class: MahalanobisDistances.

transforms an Mahalanobis Distance object so that the forward method becomes a differentiable batched Wasserstein distance between sets of items. This Wasserstein distance will use the underlying Mahalanobis distance as pairwise cost function to solve the optimal transport problem.

for more information see equation 2.5 of the reference bellow

References

Amanda Bower, Hamid Eftekhari, Mikhail Yurochkin, Yuekai Sun: Individually Fair Rankings. ICLR 2021

forward(X1: Tensor, X2: Tensor)[source]#

computes a batch wasserstein distance implied by the cost function represented by an underlying mahalanobis distance.

Parameters:

X1 (torch.Tensor) – Data sample of shape (B, N, D)
X2 (torch.Tensor) – Data sample of shape (B, M, D)

Returns:

dist – Wasserstein distance of shape (B) between batch samples in X1 and X2

Return type:

torch.Tensor