In this section we introduce the privacy, discrimination, and explainable aspects of modeling. We started with the different ways in which we can reconcile privacy with data science modelling.

Privacy-Preserving Data Mining

We will start with the different ways in which we can reconcile privacy with data science modelling.

Differential Privacy

Zero-knowledge Proof

Homomorphic Encryption

Secure Multi-Party Communication (SMPC)

Federated Learning

Discrimination-Aware Modelling

Measuring Fairness of a Predictive Model

Removing Bias

Comprehensive Models and Explainable AI

Understanding versus Explaining

Quantifying Comprehensibility

Why Do We Need to Understand and Explain Predictions?

Explaining Prediction Models and Predictions

Global Explanations

Sensitivity Analysis

Plot-based

Rule Extraction

Instance Explanations

Feature Value Plots

Feature Importance Ranking

The Evidence Counterfactual

Definition of Trust

  • Trusting the Predictions: Whether a user trusts and individual prediction sufficiently to take some action based on it.

  • Trusting the Model: Whether the user trust a model to behave in a reasonable ways if deployed.

How do we Measure Trust?

    1. Interpretable Models
    1. Accuracy

These are two of the main ways we trust explanations of a model. But is this a good idea?

Challanges

  • Interpretable models are not always possible in complex situations where relationships between features and outcomes are non-linear.

  • Even highly accurate models can be biased or make unfair decisions, especially when trained on historical data that includes biases.

Three-must Have for a Good Explanation

  • 1. Interpret ability: Humans can easily interpret reasoning

  • 2. Faithful: Describe how the model actually works

  • 3. Model Agnostic: Can explain any classifier

Definition: An algorithm that can explain the predictions of any classifier or regressor in a faithful way, by approximating it locally with an interpretable model.

GOAL: To identify an interpretable model over the interpretable representation that is locally faithful to the classifier.

Key Idea:

  • Pick a model class interpretable by humans Not globally faithful :(

  • Locally approximate global (blackbox) model Simple model globally bad, but locally good :)

Sparse Linear Explanations

  • Step 1: Sample points around xi

  • Step 2: Use complex model to predict labels for each sample

  • Step 3: Weigh samples according to distance to xi

  • Step 4: Learn new simple model on weighted samples

  • Step 5: Use simple model to explain

library(caret)
## Warning: package 'caret' was built under R version 4.3.2
## Loading required package: ggplot2
## Loading required package: lattice
## Warning: package 'lattice' was built under R version 4.3.3
library(lime)
## Warning: package 'lime' was built under R version 4.3.3
# Split up the data set
iris_test <- iris[1:5, 1:4]
iris_train <- iris[-(1:5), 1:4]
iris_lab <- iris[[5]][-(1:5)]

# Create Random Forest model on iris data
model <- train(iris_train, iris_lab, method = 'rf')

# Create an explainer object
explainer <- lime(iris_train, model)

# Explain new observation
explanation <- explain(iris_test, explainer, n_labels = 1, n_features = 2)

# The output is provided in a consistent tabular format and includes the
# output from the model.
explanation
## # A tibble: 10 × 13
##    model_type   case  label label_prob model_r2 model_intercept model_prediction
##    <chr>        <chr> <chr>      <dbl>    <dbl>           <dbl>            <dbl>
##  1 classificat… 1     seto…          1    0.702           0.120            0.950
##  2 classificat… 1     seto…          1    0.702           0.120            0.950
##  3 classificat… 2     seto…          1    0.689           0.122            0.941
##  4 classificat… 2     seto…          1    0.689           0.122            0.941
##  5 classificat… 3     seto…          1    0.678           0.128            0.945
##  6 classificat… 3     seto…          1    0.678           0.128            0.945
##  7 classificat… 4     seto…          1    0.687           0.122            0.950
##  8 classificat… 4     seto…          1    0.687           0.122            0.950
##  9 classificat… 5     seto…          1    0.686           0.125            0.956
## 10 classificat… 5     seto…          1    0.686           0.125            0.956
## # ℹ 6 more variables: feature <chr>, feature_value <dbl>, feature_weight <dbl>,
## #   feature_desc <chr>, data <list>, prediction <list>
# And can be visualised directly
plot_features(explanation)

# Summary