Diagnostics Plots For Regression Analysis

Diagnostics Plots for Linear Models

For the purposes of demonstration, we will create a conventional linear model using the lm() function and the mtcars dataset.

lmodel <-  lm(mpg ~ cyl + wt,data=mtcars)
summary(lmodel)

## 
## Call:
## lm(formula = mpg ~ cyl + wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.2893 -1.5512 -0.4684  1.5743  6.1004 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  39.6863     1.7150  23.141  < 2e-16 ***
## cyl          -1.5078     0.4147  -3.636 0.001064 ** 
## wt           -3.1910     0.7569  -4.216 0.000222 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.568 on 29 degrees of freedom
## Multiple R-squared:  0.8302, Adjusted R-squared:  0.8185 
## F-statistic: 70.91 on 2 and 29 DF,  p-value: 6.809e-12

# What are the regression coefficients?
coef(lmodel)

## (Intercept)         cyl          wt 
##   39.686261   -1.507795   -3.190972

# What are the p-values for the regression coefficients?
summary(lmodel)$coefficients

##              Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) 39.686261  1.7149840 23.140893 3.043182e-20
## cyl         -1.507795  0.4146883 -3.635972 1.064282e-03
## wt          -3.190972  0.7569065 -4.215808 2.220200e-04

# What are the 95% confidence intervals for this coefficients?
confint(lmodel)

##                 2.5 %     97.5 %
## (Intercept) 36.178725 43.1937976
## cyl         -2.355928 -0.6596622
## wt          -4.739020 -1.6429245

# What is the AIC value for this fitted model?
AIC(lmodel)

## [1] 156.0101

Test Residuals for Normality

compute the residuals of the fitted model using the resid() command.
Test the hypothesis of normality using shapiro.test().

Residuals = resid(lmodel)

shapiro.test(Residuals)

## 
##  Shapiro-Wilk normality test
## 
## data:  Residuals
## W = 0.93745, p-value = 0.06341

The p-value is 0.06341. We fail to reject the null hypothesis that the residuals are normally distributed.

Use `mfrow` argument to create grid of plots

mfrow is used to create a grid of plots. Simply specify the number of rows and columns of the grid.
On each of the plots, which points are identified as being influential?
Which points, if any, are influential according to all four plots.

par(mfrow=c(2,2))
plot(lmodel,pch=18,col="blue",cex=2)

#Put it back to Normal Mode
par(mfrow=c(1,1))

Diagnostic Plots for Linear Models with R

Plot Diagnostics for an {lm} Object

There are actually six diagnostic plots created using the plot command. The six plots, which are selectable by which, are as follows:

a plot of residuals against fitted values,
a Scale-Location plot of {sqrt(| residuals |} against fitted values,
a Normal Q-Q plot,
a plot of Cook's distances versus row labels,
a plot of residuals against leverages,
a plot of Cook's distances against leverage/(1-leverage).

Plot 4 and 6 are less commonly used however. By default, the first three and 5 are provided.

par(mfrow=c(3,2))
plot(lmodel,pch=18,which=1:6,col="blue",cex=2)

#Put it back to Normal Mode
par(mfrow=c(1,1))

Diagnostics Plots For Regression Analysis

DragonflyStats.github.io

Diagnostics Plots for Linear Models

Test Residuals for Normality

Use mfrow argument to create grid of plots

Diagnostic Plots for Linear Models with R

Use `mfrow` argument to create grid of plots