Diagnostics Plots for Linear Models
For the purposes of demonstration, we will create a conventional linear model using the lm() function and the mtcars dataset.
lmodel <- lm(mpg ~ cyl + wt,data=mtcars)
summary(lmodel)
##
## Call:
## lm(formula = mpg ~ cyl + wt, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.2893 -1.5512 -0.4684 1.5743 6.1004
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 39.6863 1.7150 23.141 < 2e-16 ***
## cyl -1.5078 0.4147 -3.636 0.001064 **
## wt -3.1910 0.7569 -4.216 0.000222 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.568 on 29 degrees of freedom
## Multiple R-squared: 0.8302, Adjusted R-squared: 0.8185
## F-statistic: 70.91 on 2 and 29 DF, p-value: 6.809e-12
# What are the regression coefficients?
coef(lmodel)
## (Intercept) cyl wt
## 39.686261 -1.507795 -3.190972
# What are the p-values for the regression coefficients?
summary(lmodel)$coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 39.686261 1.7149840 23.140893 3.043182e-20
## cyl -1.507795 0.4146883 -3.635972 1.064282e-03
## wt -3.190972 0.7569065 -4.215808 2.220200e-04
# What are the 95% confidence intervals for this coefficients?
confint(lmodel)
## 2.5 % 97.5 %
## (Intercept) 36.178725 43.1937976
## cyl -2.355928 -0.6596622
## wt -4.739020 -1.6429245
# What is the AIC value for this fitted model?
AIC(lmodel)
## [1] 156.0101
Test Residuals for Normality
- compute the residuals of the fitted model using the
resid()
command. - Test the hypothesis of normality using
shapiro.test()
.
Residuals = resid(lmodel)
shapiro.test(Residuals)
##
## Shapiro-Wilk normality test
##
## data: Residuals
## W = 0.93745, p-value = 0.06341
The p-value is 0.06341. We fail to reject the null hypothesis that the residuals are normally distributed.
Use mfrow
argument to create grid of plots
mfrow
is used to create a grid of plots. Simply specify the number of rows and columns of the grid.On each of the plots, which points are identified as being influential?
Which points, if any, are influential according to all four plots.
par(mfrow=c(2,2))
plot(lmodel,pch=18,col="blue",cex=2)
#Put it back to Normal Mode
par(mfrow=c(1,1))
Diagnostic Plots for Linear Models with R
Plot Diagnostics for an {lm} Object
There are actually six diagnostic plots created using the plot command. The six plots, which are selectable by which, are as follows:
- a plot of residuals against fitted values,
- a Scale-Location plot of {sqrt(| residuals |} against fitted values,
- a Normal Q-Q plot,
- a plot of Cook's distances versus row labels,
- a plot of residuals against leverages,
- a plot of Cook's distances against leverage/(1-leverage).
Plot 4 and 6 are less commonly used however. By default, the first three and 5 are provided.
par(mfrow=c(3,2))
plot(lmodel,pch=18,which=1:6,col="blue",cex=2)
#Put it back to Normal Mode
par(mfrow=c(1,1))