Diagnostics Plots for Linear Models

For the purposes of demonstration, we will create a conventional linear model using the lm() function and the mtcars dataset.

lmodel <-  lm(mpg ~ cyl + wt,data=mtcars)
summary(lmodel)
## 
## Call:
## lm(formula = mpg ~ cyl + wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.2893 -1.5512 -0.4684  1.5743  6.1004 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  39.6863     1.7150  23.141  < 2e-16 ***
## cyl          -1.5078     0.4147  -3.636 0.001064 ** 
## wt           -3.1910     0.7569  -4.216 0.000222 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.568 on 29 degrees of freedom
## Multiple R-squared:  0.8302, Adjusted R-squared:  0.8185 
## F-statistic: 70.91 on 2 and 29 DF,  p-value: 6.809e-12
# What are the regression coefficients?
coef(lmodel)
## (Intercept)         cyl          wt 
##   39.686261   -1.507795   -3.190972
# What are the p-values for the regression coefficients?
summary(lmodel)$coefficients
##              Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) 39.686261  1.7149840 23.140893 3.043182e-20
## cyl         -1.507795  0.4146883 -3.635972 1.064282e-03
## wt          -3.190972  0.7569065 -4.215808 2.220200e-04
# What are the 95% confidence intervals for this coefficients?
confint(lmodel)
##                 2.5 %     97.5 %
## (Intercept) 36.178725 43.1937976
## cyl         -2.355928 -0.6596622
## wt          -4.739020 -1.6429245
# What is the AIC value for this fitted model?
AIC(lmodel)
## [1] 156.0101

Test Residuals for Normality

Residuals = resid(lmodel)

shapiro.test(Residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  Residuals
## W = 0.93745, p-value = 0.06341

The p-value is 0.06341. We fail to reject the null hypothesis that the residuals are normally distributed.

Use mfrow argument to create grid of plots

  • mfrow is used to create a grid of plots. Simply specify the number of rows and columns of the grid.

  • On each of the plots, which points are identified as being influential?

  • Which points, if any, are influential according to all four plots.

par(mfrow=c(2,2))
plot(lmodel,pch=18,col="blue",cex=2)

#Put it back to Normal Mode
par(mfrow=c(1,1))

Diagnostic Plots for Linear Models with R

Plot Diagnostics for an {lm} Object

There are actually six diagnostic plots created using the plot command. The six plots, which are selectable by which, are as follows:

  1. a plot of residuals against fitted values,
  2. a Scale-Location plot of {sqrt(| residuals |} against fitted values,
  3. a Normal Q-Q plot,
  4. a plot of Cook's distances versus row labels,
  5. a plot of residuals against leverages,
  6. a plot of Cook's distances against leverage/(1-leverage).

Plot 4 and 6 are less commonly used however. By default, the first three and 5 are provided.

par(mfrow=c(3,2))
plot(lmodel,pch=18,which=1:6,col="blue",cex=2)

#Put it back to Normal Mode
par(mfrow=c(1,1))