Introduction

Partial Dependence Plots (PDP) were introduced by Friedman (2001) with purpose of interpreting complex Machine Learning algorithms. Interpreting a linear regression model is not as complicated as interpreting Support Vector Machine, Random Forest or Gradient Boosting Machine models, this is were Partial Dependence Plot can come into use. For some statistical explaination you can refer hereand More Advance. Some of the algorithms have methods for finding variable importance but they do not express whether a varaible is positively or negatively affecting the model .

How does it work ?

Given a dataset D with n observations and p predictor variables and y as response variable. Partial Dependence Plot helps understand varaible important in a given model.

Suppose you would like to understand importance of variable pi in the model, PDP builds the model averaging other predictor variable except one choosen predictor variable pi and measures change in response yhat and y, change in response can help identify how a varaible is affecting the model.

Interpret complex predictive models using Partial Dependence Plots

data(iris)
# Data
colnames(iris)
## [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width" 
## [5] "Species"
str(iris)
## 'data.frame':    150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
x <- subset(iris, select=-Species)
y <- iris$Species
# Building model
model.svm <- svm(Species~., data = iris, probability = TRUE)
# Predicting
pred <- predict(model.svm, iris, probability = TRUE)
# Confusion Matrix
table(y, pred)
##             pred
## y            setosa versicolor virginica
##   setosa         50          0         0
##   versicolor      0         48         2
##   virginica       0          2        48

Our classifier performs very well on resubstitution but we do not have much information on what variables are helping it to perform. We will explore that using Partial Dependence Plot, in this example I am using pdp package in R. There are 4 predictor variables for demonstration I will choose all of them individially and Petal.Width, Sepal.Width for two variable analysis.

library(pdp)
library(ggplot2)

# Single Variable
par.Petal_W <- partial(model.svm, pred.var = c("Petal.Width"), chull = TRUE)
plot.Petal_W <- autoplot(par.Petal_W, contour = TRUE)

# Single Variable
par.Sepal_W  <- partial(model.svm, pred.var = c("Sepal.Width"), chull = TRUE)
plot.Sepal_W  <- autoplot(par.Sepal_W , contour = TRUE)

# Single Variable
par.Petal_L <- partial(model.svm, pred.var = c("Petal.Length"), chull = TRUE)
plot.Petal_L <- autoplot(par.Petal_L, contour = TRUE)

# Single Variable
par.Sepal_L  <- partial(model.svm, pred.var = c("Sepal.Length"), chull = TRUE)
plot.Sepal_L  <- autoplot(par.Sepal_L , contour = TRUE)
# Two Variables
par.Petal_W.Sepal_W <- partial(model.svm, pred.var = c("Petal.Width", "Sepal.Width"), chull = TRUE)
plot.Petal_W.Sepal_W <- autoplot(par.Petal_W.Sepal_W, contour = TRUE, 
               legend.title = "Partial\ndependence")

grid.arrange(plot.Petal_W, plot.Sepal_W, plot.Petal_L, plot.Sepal_L, plot.Petal_W.Sepal_W)

In the above plot, please do not get confused with Y-axis. It does not show the predicted value instead how the value is changing with the change in the given predictor variable in our case Petal.Width in first plot.

In the plot if there are more variation for any given predictor variable that means the value of that variable affects the model quite alot but if the line is constant near zero it shows that variable has no affect on the model.

Single variables shows how there value affect the model, on y-axis having a negative value means for that particular value of predictor variable it is less likely to predict the correct class on that observation and having a positive value means it has positive impact on predicting the correct class. Same applies to two variable plots, color represent the intensity of affect on model.

As we can observe above, Sepal.Length is not affecting the model positively at any sense. Lets try rebuilding the model without that variable:

# Building model
model.svm <- svm(Species~Petal.Width+Petal.Length+Sepal.Width, data = iris, probability = TRUE)
# Predicting
pred <- predict(model.svm, iris, probability = TRUE)
# Confusion Matrix
table(y, pred)
##             pred
## y            setosa versicolor virginica
##   setosa         50          0         0
##   versicolor      0         48         2
##   virginica       0          2        48

There is no change in the model.