Through fairness visualizations allow for first investigations into possible fairness problems in a dataset. In this vignette we will showcase some of the pre-built fairness visualization functions. All the methods showcased below can be used together with objects of type BenchmarkResult, ResampleResult and Prediction.
For this example, we use the adult_train dataset. Keep in mind all the datasets from mlr3fairness package already set protected attribute via the col_role “pta”, here the “sex” column.
We choose a random forest as well as a decision tree model in order to showcase differences in performances.
task = tsk("adult_train")$filter(1:5000)
learner = lrn("classif.ranger", predict_type = "prob")
learner$train(task)
predictions = learner$predict(tsk("adult_test")$filter(1:5000))Note, that it is important to evaluate predictions on held-out data in order to obtain unbiased estimates of fairness and performance metrics. By inspecting the confusion matrix, we can get some first insights.
We furthermore design a small experiment allowing us to compare a random forest (ranger) and a decision tree (rpart). The result, bmr is a BenchmarkResult that contains the trained models on each cross-validation split.
design = benchmark_grid(
tasks = tsk("adult_train")$filter(1:5000),
learners = lrns(c("classif.ranger", "classif.rpart"),
predict_type = "prob"),
resamplings = rsmps("cv", folds = 3)
)
bmr = benchmark(design)
#> INFO [18:32:30.110] [mlr3] Running benchmark with 6 resampling iterations
#> INFO [18:32:30.115] [mlr3] Applying learner 'classif.ranger' on task 'adult_train' (iter 1/3)
#> INFO [18:32:31.113] [mlr3] Applying learner 'classif.ranger' on task 'adult_train' (iter 2/3)
#> INFO [18:32:32.268] [mlr3] Applying learner 'classif.rpart' on task 'adult_train' (iter 2/3)
#> INFO [18:32:32.356] [mlr3] Applying learner 'classif.ranger' on task 'adult_train' (iter 3/3)
#> INFO [18:32:33.343] [mlr3] Applying learner 'classif.rpart' on task 'adult_train' (iter 3/3)
#> INFO [18:32:33.408] [mlr3] Applying learner 'classif.rpart' on task 'adult_train' (iter 1/3)
#> INFO [18:32:33.548] [mlr3] Finished benchmarkBy inspecting the prediction density plot we can see the predicted probability for a given class split by the protected attribute, in this case "sex". Large differences in densities might hint at strong differences in the target between groups, either directly in the data or as a consequence of the modeling process. Note, that plotting densities for a Prediction requires a Task since information about protected attributes is not contained in the Prediction.
We can either plot the density with a Prediction
or use it with a BenchmarkResult / ResampleResult:
In practice, we are most often interested in a trade-off between fairness metrics and a measure of utility such as accuracy. We showcase individual scores obtained in each cross-validation fold as well as the aggregate (mean) in order to additionally provide an indication in the variance of the performance estimates.
An additional comparison can be obtained using compare_metrics. It allows comparing Learners with respect to multiple metrics. Again, we can use it with a Prediction:
or use it with a BenchmarkResult / ResampleResult:
The required metrics to create custom visualizations can also be easily computed using the $score() method.
bmr$score(msr("fairness.tpr"))
#> uhash nr task task_id
#> 1: 7c062021-a64d-4c42-b67a-3bfc22867f0a 1 <TaskClassif[50]> adult_train
#> 2: 7c062021-a64d-4c42-b67a-3bfc22867f0a 1 <TaskClassif[50]> adult_train
#> 3: 7c062021-a64d-4c42-b67a-3bfc22867f0a 1 <TaskClassif[50]> adult_train
#> 4: 401bd12f-8ce9-484c-b38e-11a142416bdb 2 <TaskClassif[50]> adult_train
#> 5: 401bd12f-8ce9-484c-b38e-11a142416bdb 2 <TaskClassif[50]> adult_train
#> 6: 401bd12f-8ce9-484c-b38e-11a142416bdb 2 <TaskClassif[50]> adult_train
#> learner learner_id resampling resampling_id
#> 1: <LearnerClassifRanger[38]> classif.ranger <ResamplingCV[20]> cv
#> 2: <LearnerClassifRanger[38]> classif.ranger <ResamplingCV[20]> cv
#> 3: <LearnerClassifRanger[38]> classif.ranger <ResamplingCV[20]> cv
#> 4: <LearnerClassifRpart[38]> classif.rpart <ResamplingCV[20]> cv
#> 5: <LearnerClassifRpart[38]> classif.rpart <ResamplingCV[20]> cv
#> 6: <LearnerClassifRpart[38]> classif.rpart <ResamplingCV[20]> cv
#> iteration prediction fairness.tpr
#> 1: 1 <PredictionClassif[20]> 0.04645741
#> 2: 2 <PredictionClassif[20]> 0.09633812
#> 3: 3 <PredictionClassif[20]> 0.07391266
#> 4: 1 <PredictionClassif[20]> 0.07199505
#> 5: 2 <PredictionClassif[20]> 0.08552081
#> 6: 3 <PredictionClassif[20]> 0.06530131