Version: Cloud

Bias analysis

The Deeploy Python client provides an interface for subgroup discovery with Fairlearn based on the FairSD project. This enables for easy discovery and analysis of potential (statistical) biases in datasets.

Installation:

Make sure to install deeploy with the fair extra:

pip install deeploy[fair] >= 1.39.2

Example

from deeploy import deepfair


deeploy_fair_lab = deepfair.DeeployFairLab(
    X_test, y_test, y_pred, sensitive_features
)

To get the subgroups, select the metric to use for subgroup discovery depending on what it means for a subgroup to be biased against in the application context of the model. Should it be measured with a performance metric such as 'accuracy' or is a fairness metric like 'equalized odds difference' more appropriate? The choice of the metric is crucial for the interpretation of the results (see Fairlearn documentation for more information).

Below is an example of how to get the subgroups based on the 'accuracy' metric:

subgroups = deeploy_fair_lab.find_sub_groups(
    quality_factor="accuracy", 
    depth=1,  # Max number of features to consider in the subgroup
    min_quality=0.0001,  # Min difference in selected metric between the subgroup and the overall population
    min_support=100,  # Min number of samples in the subgroup
)

subgroups.to_dataframe() # See the summary of the discovered subgroups

Get data for a specific subgroup:

sg_idx = subgroups.sg_feature(
    sg_index=0, 
    X=X_test
) # Get the boolean mask for the subgroup

X_sg = X_test[sg_idx]
y_sg = y_test[sg_idx]
y_sg_pred = y_pred[sg_idx]

Follow through with the data exploration and analysis of the subgroup to identify potential reasons for the bias. More detailed examples of the usage can be found in the notebooks of the Diabetes Example Repository.