Bias analysis
The Deeploy Python client provides an interface for subgroup discovery with Fairlearn based on the FairSD project. This enables for easy discovery and analysis of potential (statistical) biases in datasets.
Installation:
Make sure to install deeploy with the fair
extra:
pip install deeploy[fair] >= 1.39.2
Example
from deeploy import deepfair
fairlab = deepfair.DeeployFairLab(
X_test, y_test, y_pred, sensitive_features
)
To get the subgroups, select the metric to use for subgroup discovery depending on what it means for a subgroup to be biased against in the application context of the model. Should it be measured with a performance metric such as 'accuracy' or is a fairness metric like 'equalized odds difference' more appropriate? The choice of the metric is crucial for the interpretation of the results (see Fairlearn documentation for more information).
Below is an example of how to get the subgroups based on the 'accuracy' metric:
subgroups = d.find_sub_groups(
quality_factor="accuracy",
depth=1, # Max number of features to consider in the subgroup
min_quality=0.0001, # Min difference in selected metric between the subgroup and the overall population
min_support=100, # Min number of samples in the subgroup
)
subgroups.to_dataframe() # See the summary of the discovered subgroups
Get data for a specific subgroup:
sg_idx = subgroups.sg_feature(
sg_index=0,
X=X_test
) # Get the boolean mask for the subgroup
X_sg = X_test[sg_idx]
y_sg = y_test[sg_idx]
y_sg_pred = y_pred[sg_idx]
Follow through with the data exploration and analysis of the subgroup to identify potential reasons for the bias. More detailed examples of the usage can be found in the notebooks of the Diabetes Example Repository.