Normality of Residuals

Sameer Mathur

Example: Detect and Rectify Non-normality of Residuals using cars dataset

Regression Diagnostics

---

Normality of Residuals

The residual error terms are assumed to be normally distributed i.e. \( \epsilon ~ N(\mu, \sigma^2) \).

Normality of Residuals

If the error terms are non- normally distributed, confidence intervals may become too wide or narrow. Once confidence interval becomes unstable, it leads to difficulty in estimating coefficients based on minimization of least squares. Presence of non - normal distribution suggests that there are a few unusual data points which must be studied closely to make a better model.

Normality Plot

Regression Model

Simple linear regression

# fitting simple linear model
fitCarsModel <- lm(dist ~ speed, data = cars)
# summary of the fitted model
summary(fitCarsModel)

Call:
lm(formula = dist ~ speed, data = cars)

Residuals:
    Min      1Q  Median      3Q     Max 
-29.069  -9.525  -2.272   9.215  43.201 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -17.5791     6.7584  -2.601   0.0123 *  
speed         3.9324     0.4155   9.464 1.49e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 15.38 on 48 degrees of freedom
Multiple R-squared:  0.6511,    Adjusted R-squared:  0.6438 
F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12

Diagnostic Plots

# residual plots of OLS model
par(mfrow=c(2,2))
plot(fitCarsModel)

The normality assumption can be checked by inspecting the Residuals vs Fitted plot (2nd plot) from the Diagnostic Plots.

plot of chunk unnamed-chunk-3

Normality of residuals

The Q-Q plot of residuals can be used to visually check the normality assumption. The normal probability plot of residuals should approximately follow a straight line (2nd plot).

# normal probability plot of residuals
plot(fitCarsModel, 2)

plot of chunk unnamed-chunk-5

In our example, all the points does not fall approximately along this reference line, so we cannot assume normality.

Alternate way to plot Normality plot

  1. Normality plot using Quantile-Quantile (Q-Q plot) using qqnorm() and qqline() functions.

  2. Normality plot using qqPlot() function in car package.

1. Using qqnorm() and qqline() function

qqnorm(cars$dist)
qqline(cars$dist, col = "red")

plot of chunk unnamed-chunk-6

2. Using qqPlot() in car package

# normality plot using qqPlot() in car package
library("car")
qqPlot(cars$dist)

plot of chunk unnamed-chunk-8

[1] 49 48

The value showing in output 48 and 49 are the outliers.

Normality Test

Visual inspection, described in the previous section, is usually unreliable. It's possible to use a significance test comparing the sample distribution to a normal one in order to ascertain whether data show or not a serious deviation from normality.

There are several methods for normality test such as

  1. Anderson-Darling test

  2. Shapiro-Wilk's test

  3. Kolmogorov-Smirnov (K-S) test

1. Anderson-Darling normality test

The Anderson-Darling test (AD test, for short) is one of the most commonly used normality tests, and can be executed using the ad.test() command present within the nortest package.

# Anderson-Darling normality test
library(nortest)
ad.test(cars$dist)

    Anderson-Darling normality test

data:  cars$dist
A = 0.74067, p-value = 0.05021

Source Wikipedia: Anderson-Darling test

2. Shapiro-Wilk's method to test normality

Shapiro-Wilk's method is widely recommended for normality test and it provides better power than K-S. It is based on the correlation between the data and the corresponding normal scores.

# Shapiro-Wilk's normality test
shapiro.test(cars$dist)

    Shapiro-Wilk normality test

data:  cars$dist
W = 0.95144, p-value = 0.0391

From the output, the p-value < 0.05 implying that the distribution of the data are significantly different from normal distribution. In other words, we cannot assume the normality.

Note:

Normality test is sensitive to sample size. Small samples most often pass normality tests. Therefore, it's important to combine visual inspection and significance test in order to take the right decision.

Source Wikipedia: Shapiro-Wilk's test

Rectify Non-normality

Box-Cox Transformation

library(caret)
distTrans <- BoxCoxTrans(cars$dist)
distTrans
Box-Cox Transformation

50 data points used to estimate Lambda

Input data summary:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   2.00   26.00   36.00   42.98   56.00  120.00 

Largest/Smallest: 60 
Sample Skewness: 0.759 

Estimated Lambda: 0.5 
# append the transformed variable to cars
cars <- cbind(cars, distNew = predict(distTrans, cars$dist))
# first few rows of the datset
head(cars)
  speed dist   distNew
1     4    2 0.8284271
2     4   10 4.3245553
3     7    4 2.0000000
4     7   22 7.3808315
5     8   16 6.0000000
6     9   10 4.3245553

predict() uses the fitted model to predict response values for the new dataset.

Regression model on transformed data

The new regresison model will be based on the transformed data.

Simple Linear Regression

# fitting simple linear model
fitCarsTransModel <- lm(distNew ~ speed, data = cars)
# summary of the fitted model
summary(fitCarsTransModel)

Call:
lm(formula = distNew ~ speed, data = cars)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.1369 -1.3966 -0.3598  1.1817  6.3069 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.55410    0.96888   0.572     0.57    
speed        0.64483    0.05957  10.825 1.77e-14 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.205 on 48 degrees of freedom
Multiple R-squared:  0.7094,    Adjusted R-squared:  0.7034 
F-statistic: 117.2 on 1 and 48 DF,  p-value: 1.773e-14

Residual versus Fitted plot after transformation

# residual vs. fitted plot
plot(fitCarsTransModel, 2)

plot of chunk unnamed-chunk-14

Comparing Residual versus Fitted plots before and after Box-Cox transformation

Before Box-Cox transformation

# normal probability plot of residuals
plot(fitCarsModel, 2)

plot of chunk unnamed-chunk-15

After Box-Cox transformation

# normal probability plot of residuals
plot(fitCarsTransModel, 2)

plot of chunk unnamed-chunk-16

We can not say anything related to change in normality of the residuals. Hence, we will run the statistical test to check the change in normality after the Box-Cox transformation.

Normality test after Box-Cox transformation

Anderson-Darling normality test

# Anderson-Darling normality test
library(nortest)
ad.test(cars$distNew)

    Anderson-Darling normality test

data:  cars$distNew
A = 0.1389, p-value = 0.9732

Shapiro-Wilk's method to test normality

# Shapiro-Wilk's normality test
shapiro.test(cars$distNew)

    Shapiro-Wilk normality test

data:  cars$distNew
W = 0.99347, p-value = 0.9941

From both the test, the p-value is more than 0.05 which implies that the distribution of the data are not significantly different from normal distribution.

In other words, we can assume the normality after the transforming our outcome variable using BoxCox transformation.