Chapter 5: Model-Agnostic Methods

Explanatory Model Analysis

https://christophm.github.io/interpretable-ml-book

Create predictors.

predictor <- Predictor$new(
  model = rf, 
  data = test,
  class = "1"
)

Partial Dependence Plot

The general idea underlying the construction of Partial Dependence Plots is to show how the expected value of model prediction behaves as a function of a selected explanatory variable.

Compute the PDP for age using iml.

pdp <- FeatureEffect$new(predictor, feature = "age", method = "pdp", grid.size = 30) 

p1 <- pdp$plot() + 
  xlab('Age') + 
  scale_y_continuous('Predicted stroke probability', limits = c(0, 1))

pdp$set.feature("avg_glucose_level")
p2 <- pdp$plot() + 
  xlab('Average glucose level') + 
  scale_y_continuous('Predicted stroke probability', limits = c(0, 1))

gridExtra::grid.arrange(p1, p2, ncol = 2)

pd <- FeatureEffect$new(predictor, c("age", "avg_glucose_level"), method = "pdp") 
pd$plot() +
  scale_fill_viridis(option = "D") +
  labs(x = 'Age', y = 'Average glucose level', fill = "Predicted stroke probability")

I don’t think the plot are correct. Try the DALEX package:

exp <- explain(rf, data = train[, -11], y = train$stroke, type = "classification")

## Preparation of a new explainer is initiated
##   -> model label       :  train.formula  ( [33m default [39m )
##   -> data              :  768  rows  10  cols 
##   -> data              :  tibbble converted into a data.frame 
##   -> target variable   :  768  values 
##   -> target variable   :  Please note that 'y' is a factor.  ( [31m WARNING [39m )
##   -> target variable   :  Consider changing the 'y' to a logical or numerical vector.
##   -> target variable   :  Otherwise I will not be able to calculate residuals or loss function.
##   -> model_info        :  package caret , ver. 6.0.86 , task Classification ( [33m default [39m ) 
##   -> model_info        :  type set to  classification 
##   -> predict function  :  yhat.train  will be used ( [33m default [39m )
##   -> predicted values  :  numerical, min =  0 , mean =  0.5478733 , max =  0.9966667  
##   -> residual function :  difference between y and yhat ( [33m default [39m )

## Warning in Ops.factor(y, predict_function(model, data)): '-' not meaningful for factors

##   -> residuals         :  numerical, min =  NA , mean =  NA , max =  NA  
##  [32m A new explainer has been created! [39m

mp <- model_profile(exp)
plot(mp, variables = "age")

The assumption of independence is the biggest issue with PD plots. It is assumed that the feature(s) for which the partial dependence is computed are not correlated with other features. When the features are correlated, we create new data points in areas of the feature distribution where the actual probability is very low. One solution to this problem is Accumulated Local Effect (ALE) plots that work with the conditional instead of the marginal distribution.

Heterogeneous effects might be hidden because PD plots only show the average marginal effects. By plotting the individual conditional expectation curves instead of the aggregated line, we can uncover heterogeneous effects.