Plotting partial effects in R with confidence intervals can be a somewhat tedious task. peplot() takes care of all the dirty work for us, requiring very little effort on the user’s part. But, before going into detail about peplot(), let’s first consider what a partial effect plot is.

Partial Effects

Partial effect plots go by serval names (partial residual plots, added variable plots, adjusted variable plots, etc.). Their primary purpose is to show the relationship between two plotted variables (an outcome and an explanatory variable) while adjusting for interference from other explanatory variables. Take a simple regression model with only one explanatory variable: \[y_i = \beta_0 + \beta_1x_i + \epsilon_i\tag{1}\] In such a model, \(\hat{\beta}_1\) tells us only the effect of \(x\) on \(y\) without holding constant other predictors (say \(w\) and \(z\)) that may also account for some variance of \(y\). Multiple regression, on the other hand, tells us the partial effect of \(x\) on \(y\), holding all other covariates constant. Consider the below multiple regression model: \[y_i = \gamma_0 + \gamma_1x_i + \gamma_2w_i + \gamma_3z_i + \varepsilon_i\tag{2}\] \(\hat{\gamma}_1\), unlike \(\hat{\beta}_1\) in equation 1, tells us the partial effect of \(x\) on \(y\), holding \(w\) and \(z\) constant.

A partial effect plot, by extension, plots the partial effect of \(x\) isolated from the effects of \(w\) and \(z\) on \(y\).

Plotting Partial Effects

Plotting partial effects is fairly straightforward.

  1. Regress the outcome variable \(y\) on \(w\) and \(z\) (exclude \(x\)). \[y_i = \alpha_0 + \alpha_1w_i + \alpha_2z_i + \nu_i\tag{3}\]
  2. Regress \(x\) on \(w\) and \(z\). \[x_i = \sigma_0 + \sigma_2w_i + \sigma_2z_i + \mu_i\tag{4}\]
  3. Plot the residuals from equation 3 (\(\nu_i\)) against the residuals from equation 4 (\(\mu_i\)).

The code to do this in R is:

# Say we want to plot the partial effect of some variable.
# First, let's consider the following model, where we regress.
# mpg on hp, wt, and cyl in the mtcars dataset.
lm1 <- lm(mpg ~ hp + wt + cyl, data = mtcars)

# If we want to plot the partial effect of, let's say, hp on mpg,
# we need to (1) regress mpg on all covariates except hp:

lm2 <- lm(mpg ~ wt + cyl, data = mtcars)

# and (2), regress hp on the other variables included in the right-
# hand side of the model:

lm3 <- lm(hp ~ wt + cyl, data = mtcars)

# We then plot the residuals from lm2 against the residuals from
# lm3, along with confidence intervals.

x <- resid(lm3)
y <- resid(lm2)
plot(x,y,xlab="hp",ylab="mpg")

If we want to add a line of best fit along with confidence intervals, we simply include some extra code:

x <- resid(lm3)
y <- resid(lm2)
plot(x,y,type="n")
part <- lm(y~x)
wx = par("usr")[1:2]
new.x = seq(wx[1],wx[2],len=100)
pred = predict(part, new=data.frame(x=new.x), interval="conf")
lines(new.x,pred[,"fit"],lwd=2)
lines(new.x,pred[,"lwr"],lty=3)
lines(new.x,pred[,"upr"],lty=3)
points(x,y,pch=16,col="steelblue")

That extra code isn’t terribly laborious per se; however, peplot(), as I’ll show below, let’s us do all that, and more, with just a single line of code.

peplot()

peplot() takes all the above, and puts it into a simple function where the user need only specify a multiple regression model object and a variable from the model for which partial effects should be plotted. By default, the function does not plot points (only the trend line with 95% confidence intervals); however, the function leaves room for lots of customization if one would like to plot points as well as vary the size of the confidence intervals.

# ------------------------------------
# Partial Effect Plot function (peplot)
# ------------------------------------
peplot <- function(mod,var,ci=.95, plot_points = "n",
                   xlab=var,ylab=names(mod[12]$model)[1],
                   main="Partial Effect Plot",
                   pe_lty=1,pe_lwd=3,pe_col="black",
                   ci_lty=1,ci_lwd=1,ci_col="black",
                   pch_col="black",pch_ty=19,
                   ylim=c(min(pred[,"lwr"]),max(pred[,"upr"]))){
  modDat <- mod[12]$model
  modDat1 <- modDat[,-1]
  modDat2 <- modDat[,which(names(modDat)!=var)]
  x <- resid(lm(modDat1[,var] ~., data=modDat1[,which(names(modDat1)!=var)]))
  y <- resid(lm(modDat2[,1] ~ ., modDat2[,-1]))
  plot(x,y,type=plot_points,xlab=xlab,ylab=ylab,
       ylim=ylim,col=pch_col,pch=pch_ty,
       main=main)
  part <- lm(y~x)
  wx <- par("usr")[1:2]
  new.x <- seq(wx[1],wx[2],len=100)
  pred <- predict(part, new=data.frame(x=new.x), interval="conf",
                  level = ci)
  lines(new.x,pred[,"fit"],lwd=pe_lwd,lty=pe_lty,col=pe_col)
  lines(new.x,pred[,"lwr"],lwd=ci_lwd,lty=ci_lty,col=ci_col)
  lines(new.x,pred[,"upr"],lwd=ci_lwd,lty=ci_lty,col=ci_col)
}

peplot() includes:

In addition to the above, line types and point types can also be specified.

Let’s see it in action:

peplot(lm1,var="hp")

Now, let’s add some customization:

peplot(lm1,var="hp",plot_points="p",ylim=NULL,pch_col="steelblue",ci_lty=2)

It’s that simple. Also, because peplot() was created using base R graphics, par() commands are also applicable:

par(family="serif",bty="l")
peplot(lm1,var="hp",plot_points="p",ylim=NULL,pch_col="steelblue",ci_lty=2,
       xlab="Horsepower",ylab="Miles per Gallon")