Assessing Normality in Data
Anderson-Darling Test
To implement the Anderson-Darling test for
normality, you’ll need the nortest
package:
# Install and load package
install.packages("nortest")
library(nortest)
# Generate data and run test
set.seed(1234)
NormDat <- rnorm(100)
ad.test(NormDat)
Shapiro-Wilk Test
This test is built into R and does not require additional packages:
Example Output:
Since the p-value is above 0.05, we fail to reject the null hypothesis and treat the data as normally distributed.
Graphical Methods for Normality
Histogram
A quick way to check for normality is using a histogram. A roughly bell-shaped curve suggests normality:
Transforming the Data
If your data are not normally distributed, especially if positively skewed, a log transformation can help:
set.seed(1919)
X <- rexp(30, rate = 0.50)
# Before transformation
shapiro.test(X)
# After log transformation
shapiro.test(log(X))
The transformation often brings p-values above 0.05, suggesting that the transformed data are normally distributed.
Outliers and Normality
Outliers can distort the results of normality tests:
- Boxplots can help detect them visually.
- Formal tests like Grubbs’ test (covered later) assess statistical outliers.