1 Introduction

I wanted to look at the normality tests for data as I tend not to use them as my belief is that they are going to be unreliable in most cases and irrelevant in others. They are unreliable for small datasets and they are irellevant for large datasets where you can do tests about medians.

1.1 Simulation One: The shape of the p-value distribution for tests of normality.

The first simulation creates 10,000 random samples from a normal distribution of size 10 and calculates the Kolmogorov-Smirnoff, Shapiro-Wilks and Anderson-Darling tests for each of the samples. It then plots histograms of the p-vallues for each of the normality tests, keeping in mind that these data are all from a normal distribution.

y <- vector()
z <- vector()
w <- vector()
for (i in 1:10000){
  sample <- rnorm(10, mean=0, sd=1)
  temp <- lillie.test(sample)
  y[i] <- temp$p.value
  temp <- shapiro.test(sample)
  z[i] <- temp$p.value
  temp <- ad.test(sample)
  w[i] <- temp$p.value
}
hist(y, main="Histogram of KS Tests", xlab="Mean", ylab="Density", freq=F)
x<-seq(0.05,1,by=0.001)
curve(dnorm(x, mean=mean(y),sd=sd(y)), add=TRUE)

hist(z, main="Histogram of Shapiro Wilks Tests", xlab="Mean", ylab="Density", freq=F)
x<-seq(0.05,1,by=0.001)
curve(dnorm(x, mean=mean(z),sd=sd(z)), add=TRUE)

hist(w, main="Histogram of Anderson Darling Tests", xlab="Mean", ylab="Density", freq=F)
x<-seq(0.05,1,by=0.001)
curve(dnorm(x, mean=mean(w),sd=sd(w)), add=TRUE)

This is a strange result and not what I was expecting. I did not expect it to have a uniform distribution as the simulated data is normally distributed. I expected something more like a normal distribution with at least some sort of peak in the middle or maybe an upward slope, but not uniform where you will reject 5% of data as false negative.

1.2 Simulation Two: The effect of large samples

I am going to repeat the process but this time the sample size will be 40. This is the threshold for treating the median as a robust measure of the centre of the distribution for t-tests even if it is not normally distributed.

y <- vector()
z <- vector()
w <- vector()
for (i in 1:10000){
  sample <- rnorm(40, mean=0, sd=1)
  temp <- lillie.test(sample)
  y[i] <- temp$p.value
  temp <- shapiro.test(sample)
  z[i] <- temp$p.value
  temp <- ad.test(sample)
  w[i] <- temp$p.value
}
hist(y, main="Histogram of KS Tests", xlab="Mean", ylab="Density", freq=F)
x<-seq(0.05,1,by=0.001)
curve(dnorm(x, mean=mean(y),sd=sd(y)), add=TRUE)

hist(z, main="Histogram of Shapiro Wilks Tests", xlab="Mean", ylab="Density", freq=F)
x<-seq(0.05,1,by=0.001)
curve(dnorm(x, mean=mean(z),sd=sd(z)), add=TRUE)

hist(w, main="Histogram of Anderson Darling Tests", xlab="Mean", ylab="Density", freq=F)
x<-seq(0.05,1,by=0.001)
curve(dnorm(x, mean=mean(w),sd=sd(w)), add=TRUE)

This is exactly the same result. You still reject 5% which is unexpected as you would hope that as the sample size increases and you are sampling from a normal distribution and as the sample size increases it should be more representative but the false negative rate remains the same.

1.3 Simulation Three: The effect of small samples

I now want to look at the effect of small sample sizes. I will go down to a sample size of 8 which is the smallest that you can test.

y <- vector()
z <- vector()
w <- vector()
for (i in 1:10000){
  sample <- rnorm(8, mean=0, sd=1)
  temp <- lillie.test(sample)
  y[i] <- temp$p.value
  temp <- shapiro.test(sample)
  z[i] <- temp$p.value
  temp <- ad.test(sample)
  w[i] <- temp$p.value
}
hist(y, main="Histogram of KS Tests", xlab="Mean", ylab="Density", freq=F)
x<-seq(0.05,1,by=0.001)
curve(dnorm(x, mean=mean(y),sd=sd(y)), add=TRUE)

hist(z, main="Histogram of Shapiro Wilks Tests", xlab="Mean", ylab="Density", freq=F)
x<-seq(0.05,1,by=0.001)
curve(dnorm(x, mean=mean(z),sd=sd(z)), add=TRUE)

hist(w, main="Histogram of Anderson Darling Tests", xlab="Mean", ylab="Density", freq=F)
x<-seq(0.05,1,by=0.001)
curve(dnorm(x, mean=mean(w),sd=sd(w)), add=TRUE)

Again this is still exactly the same. Which is really odd as you would hope that more would be rejected.

1.4 Simulation Four: Samples from the Exponential Distribution

This time I am going to sample from an exponential distriution. This is the least normal example of a distribution. It has no central maximum and it has a long tail.

y <- vector()
z <- vector()
w <- vector()
for (i in 1:10000){
  sample <- rexp(10, rate=1)
  temp <- lillie.test(sample)
  y[i] <- temp$p.value
  temp <- shapiro.test(sample)
  z[i] <- temp$p.value
  temp <- ad.test(sample)
  w[i] <- temp$p.value
}
hist(y, main="Histogram of KS Tests", xlab="Mean", ylab="Density", freq=F)
x<-seq(0.05,1,by=0.001)
curve(dnorm(x, mean=mean(y),sd=sd(y)), add=TRUE)

hist(z, main="Histogram of Shapiro Wilks Tests", xlab="Mean", ylab="Density", freq=F)
x<-seq(0.05,1,by=0.001)
curve(dnorm(x, mean=mean(z),sd=sd(z)), add=TRUE)

hist(w, main="Histogram of Anderson Darling Tests", xlab="Mean", ylab="Density", freq=F)
x<-seq(0.05,1,by=0.001)
curve(dnorm(x, mean=mean(w),sd=sd(w)), add=TRUE)

This shows that it will often be accepted as normal when it clearly isn’t. There is an appreciable density above the cut-off for the test of normality even though you would hope that most of the samples should be rejected.

1.5 Simulation Five: Large sample sizes of the exponential distribution

The first simulation was for a sample size of 10 but if I now increase that to 40 as with the sample from the normal distribution does the rejection criteria improve?

y <- vector()
z <- vector()
w <- vector()
for (i in 1:10000){
  sample <- rexp(40, rate=1)
  temp <- lillie.test(sample)
  y[i] <- temp$p.value
  temp <- shapiro.test(sample)
  z[i] <- temp$p.value
  temp <- ad.test(sample)
  w[i] <- temp$p.value
}
hist(y, main="Histogram of KS Tests", xlab="Mean", ylab="Density", freq=F)
x<-seq(0.05,1,by=0.001)
curve(dnorm(x, mean=mean(y),sd=sd(y)), add=TRUE)

hist(z, main="Histogram of Shapiro Wilks Tests", xlab="Mean", ylab="Density", freq=F)
x<-seq(0.05,1,by=0.001)
curve(dnorm(x, mean=mean(z),sd=sd(z)), add=TRUE)

hist(w, main="Histogram of Anderson Darling Tests", xlab="Mean", ylab="Density", freq=F)
x<-seq(0.05,1,by=0.001)
curve(dnorm(x, mean=mean(w),sd=sd(w)), add=TRUE)

1.6 Simulation Six: The gamma distribution

The exponential is a distribution a long way from the normal but we could try something a little more similar such as the gamma distribution.

y <- vector()
z <- vector()
w <- vector()
for (i in 1:10000){
  sample <- rgamma(10, shape=1, rate=1)
  temp <- lillie.test(sample)
  y[i] <- temp$p.value
  temp <- shapiro.test(sample)
  z[i] <- temp$p.value
  temp <- ad.test(sample)
  w[i] <- temp$p.value
}
hist(y, main="Histogram of KS Tests", xlab="Mean", ylab="Density", freq=F)
x<-seq(0.05,1,by=0.001)
curve(dnorm(x, mean=mean(y),sd=sd(y)), add=TRUE)

hist(z, main="Histogram of Shapiro Wilks Tests", xlab="Mean", ylab="Density", freq=F)
x<-seq(0.05,1,by=0.001)
curve(dnorm(x, mean=mean(z),sd=sd(z)), add=TRUE)

hist(w, main="Histogram of Anderson Darling Tests", xlab="Mean", ylab="Density", freq=F)
x<-seq(0.05,1,by=0.001)
curve(dnorm(x, mean=mean(w),sd=sd(w)), add=TRUE)

This produces the same issues as for the exponential when the sample size is 10 when a lot of samples will not be rejected when they should be.

1.7 Simulation Seven: The large sample gamma distribution

Again this situation should improve as we go to larger samples sizes.

y <- vector()
z <- vector()
w <- vector()
for (i in 1:10000){
  sample <- rgamma(40, shape=1, rate=1)
  temp <- lillie.test(sample)
  y[i] <- temp$p.value
  temp <- shapiro.test(sample)
  z[i] <- temp$p.value
  temp <- ad.test(sample)
  w[i] <- temp$p.value
}
hist(y, main="Histogram of KS Tests", xlab="Mean", ylab="Density", freq=F)
x<-seq(0.05,1,by=0.001)
curve(dnorm(x, mean=mean(y),sd=sd(y)), add=TRUE)

hist(z, main="Histogram of Shapiro Wilks Tests", xlab="Mean", ylab="Density", freq=F)
x<-seq(0.05,1,by=0.001)
curve(dnorm(x, mean=mean(z),sd=sd(z)), add=TRUE)

hist(w, main="Histogram of Anderson Darling Tests", xlab="Mean", ylab="Density", freq=F)
x<-seq(0.05,1,by=0.001)
curve(dnorm(x, mean=mean(w),sd=sd(w)), add=TRUE)

That is much better for sample sizes of 40 it clearly shows that this gamma is not norma

1.8 Simulation Eight: The sample size effects for detecting non-normality of exponential data

The previous simulations for exponential and gamma distributions show that there is a sample size effect of the tests correctly assigning the data to not being normally distributed. This can be plotted by varying the sample size and taking the median of the p-values for the test distribution.

y <- vector()
z <- vector()
w <- vector()
s <- vector()
t <- vector()
u <- vector()
x <- 8:40
for (j in 1:33){
for (i in 1:10000){
  sample <- rexp(x[j], rate=1)
  temp <- lillie.test(sample)
  y[i] <- temp$p.value
  temp <- shapiro.test(sample)
  z[i] <- temp$p.value
  temp <- ad.test(sample)
  w[i] <- temp$p.value
  
}
  u[j] <- median(w)
  t[j] <- median(z)
  s[j] <- median(y)
}

plot(x, s, ylab="Median p-value for KS Normality", xlab="Sample Size", main="Median p-value KS Test for Normality vs Sample Size")

plot(x, t, ylab="Median p-value for SW Normality", xlab="Sample Size", main="Median p-value SW Test for Normality vs Sample Size")

plot(x, u, ylab="Median p-value for AD Normality", xlab="Sample Size", main="Median p-value AD Test for Normality vs Sample Size")

The best performance seems to be from Shapiro-Wilks where the sample size for detecting non-normality is lower than for the other two tests. The KS test has the worst performance.

1.9 Simulation Nine: The sample size effects for detecting non-normality of exponential data

I have done the same for the gamma distribution and the results are the same. The KS test has the worst performance and Shapiro-Wilks the best performance.

y <- vector()
z <- vector()
w <- vector()
s <- vector()
t <- vector()
u <- vector()
x <- 8:40
for (j in 1:33){
for (i in 1:10000){
  sample <- rgamma(x[j],shape=1, rate=1)
  temp <- lillie.test(sample)
  y[i] <- temp$p.value
  temp <- shapiro.test(sample)
  z[i] <- temp$p.value
  temp <- ad.test(sample)
  w[i] <- temp$p.value
  
}
  u[j] <- median(w)
  t[j] <- median(z)
  s[j] <- median(y)
}

plot(x, s, ylab="Median p-value for KS Normality", xlab="Sample Size", main="Median p-value KS Test for Normality vs Sample Size")

plot(x, t, ylab="Median p-value for SW Normality", xlab="Sample Size", main="Median p-value SW Test for Normality vs Sample Size")

plot(x, u, ylab="Median p-value for AD Normality", xlab="Sample Size", main="Median p-value AD Test for Normality vs Sample Size")

Normality Tests