inFairness.distances.logistic_sensitive_subspace module#

class inFairness.distances.logistic_sensitive_subspace.LogisticRegSensitiveSubspace[source]#

Bases: SensitiveSubspaceDistance

Implements the Softmax Regression model based fair metric as defined in Appendix B.1 of “Training individually fair ML models with sensitive subspace robustness” paper.

This metric assumes that the sensitive attributes are discrete and observed for a small subset of training data. Assuming data of the form \((X_i, K_i, Y_i)\) where \(K_i\) is the sensitive attribute of the i-th subject, the model fits a softmax regression model to the data as:

\[\mathbb{P}(K_i = l\mid X_i) = \frac{\exp(a_l^TX_i+b_l)}{\sum_{l=1}^k \exp(a_l^TX_i+b_l)},\ l=1,\ldots,k\]

Using the span of the matrix \(A=[a_1, \cdots, a_k]\), the fair metric is trained as:

\[d_x(x_1,x_2)^2 = (x_1 - x_2)^T(I - P_{\text{ran}(A)})(x_1 - x_2)\]

References

Yurochkin, Mikhail, Amanda Bower, and Yuekai Sun. “Training individually fair ML models with sensitive subspace robustness.” arXiv preprint arXiv:1907.00020 (2019).

compute_basis_vectors_data(X_train, y_train)[source]#
compute_basis_vectors_protected_idxs(data, protected_idxs, keep_protected_idxs=True)[source]#
fit(data_X: Tensor, data_SensitiveAttrs: Tensor | None = None, protected_idxs: Iterable[int] | None = None, keep_protected_idxs: bool = True, autoinfer_device: bool = True)[source]#

Fit Logistic Regression Sensitive Subspace distance metric

Parameters:
  • data_X (torch.Tensor) – Input data corresponding to either \(X_i\) or \((X_i, K_i)\) in the equation above. If the variable corresponds to \(X_i\), then the y_train parameter should be specified. If the variable corresponds to \((X_i, K_i)\) then the protected_idxs parameter should be specified to indicate the sensitive attributes.

  • data_SensitiveAttrs (torch.Tensor) – Represents the sensitive attributes ( \(K_i\) ) and is used when the X_train parameter represents \(X_i\) from the equation above. Note: This parameter is mutually exclusive with the protected_idxs parameter. Specififying both the data_SensitiveAttrs and protected_idxs parameters will raise an error

  • protected_idxs (Iterable[int]) – If the X_train parameter above represents \((X_i, K_i)\), then this parameter is used to provide the indices of sensitive attributes in X_train. Note: This parameter is mutually exclusive with the protected_idxs parameter. Specififying both the data_SensitiveAttrs and protected_idxs parameters will raise an error

  • keep_protected_indices (bool) – True, if while training the model, protected attributes will be part of the training data Set to False, if for training the model, protected attributes will be excluded Default = True

  • autoinfer_device (bool) – Should the distance metric be automatically moved to an appropriate device (CPU / GPU) or not? If set to True, it moves the metric to the same device X_train is on. If set to False, keeps the metric on CPU.

property logistic_regression_models#

Logistic Regression models trained by the metric to predict each sensitive attribute given inputs. The property is a list of logistic regression models each corresponding to \(\mathbb{P}(K_i = l\mid X_i)\). This property can be used to measure the performance of the logistic regression models.

training: bool#