Assessing Normality in Data

Anderson-Darling Test

To implement the Anderson-Darling test for normality, you’ll need the nortest package:

# Install and load package
install.packages("nortest")
library(nortest)

# Generate data and run test
set.seed(1234)
NormDat <- rnorm(100)
ad.test(NormDat)

Shapiro-Wilk Test

This test is built into R and does not require additional packages:

# Generate data and run test
set.seed(1234)
NormDat <- rnorm(100)

shapiro.test(NormDat)

Example Output:

Shapiro-Wilk normality test

data:  NormDat
W = 0.9864, p-value = 0.4003

Since the p-value is above 0.05, we fail to reject the null hypothesis and treat the data as normally distributed.


Graphical Methods for Normality

Histogram

A quick way to check for normality is using a histogram. A roughly bell-shaped curve suggests normality:

hist(NormDat)

Q-Q Plot

Quantile-Quantile (Q-Q) plots visually compare data to a normal distribution:

qqnorm(NormDat)
qqline(NormDat)

If the data points follow the straight line, normality is plausible.


Transforming the Data

If your data are not normally distributed, especially if positively skewed, a log transformation can help:

set.seed(1919)
X <- rexp(30, rate = 0.50)

# Before transformation
shapiro.test(X)

# After log transformation
shapiro.test(log(X))

The transformation often brings p-values above 0.05, suggesting that the transformed data are normally distributed.


Outliers and Normality

Outliers can distort the results of normality tests:

  • Boxplots can help detect them visually.
  • Formal tests like Grubbs’ test (covered later) assess statistical outliers.

Summary Slide: Shapiro-Wilk Example

x <- rnorm(40)
shapiro.test(x)

Output:

Shapiro-Wilk normality test

data:  x
W = 0.9601, p-value = 0.1689

Since p > 0.01, we again fail to reject the null hypothesis: the data appear normal.