I wanted to look at the normality tests for data as I tend not to use them as my belief is that they are going to be unreliable in most cases and irrelevant in others. They are unreliable for small datasets and they are irellevant for large datasets where you can do tests about medians.
The first simulation creates 10,000 random samples from a normal distribution of size 10 and calculates the Kolmogorov-Smirnoff, Shapiro-Wilks and Anderson-Darling tests for each of the samples. It then plots histograms of the p-vallues for each of the normality tests, keeping in mind that these data are all from a normal distribution.
<- vector()
y <- vector()
z <- vector()
w for (i in 1:10000){
<- rnorm(10, mean=0, sd=1)
sample <- lillie.test(sample)
temp <- temp$p.value
y[i] <- shapiro.test(sample)
temp <- temp$p.value
z[i] <- ad.test(sample)
temp <- temp$p.value
w[i]
}hist(y, main="Histogram of KS Tests", xlab="Mean", ylab="Density", freq=F)
<-seq(0.05,1,by=0.001)
xcurve(dnorm(x, mean=mean(y),sd=sd(y)), add=TRUE)
hist(z, main="Histogram of Shapiro Wilks Tests", xlab="Mean", ylab="Density", freq=F)
<-seq(0.05,1,by=0.001)
xcurve(dnorm(x, mean=mean(z),sd=sd(z)), add=TRUE)
hist(w, main="Histogram of Anderson Darling Tests", xlab="Mean", ylab="Density", freq=F)
<-seq(0.05,1,by=0.001)
xcurve(dnorm(x, mean=mean(w),sd=sd(w)), add=TRUE)
This is a strange result and not what I was expecting. I did not expect it to have a uniform distribution as the simulated data is normally distributed. I expected something more like a normal distribution with at least some sort of peak in the middle or maybe an upward slope, but not uniform where you will reject 5% of data as false negative.
I am going to repeat the process but this time the sample size will be 40. This is the threshold for treating the median as a robust measure of the centre of the distribution for t-tests even if it is not normally distributed.
<- vector()
y <- vector()
z <- vector()
w for (i in 1:10000){
<- rnorm(40, mean=0, sd=1)
sample <- lillie.test(sample)
temp <- temp$p.value
y[i] <- shapiro.test(sample)
temp <- temp$p.value
z[i] <- ad.test(sample)
temp <- temp$p.value
w[i]
}hist(y, main="Histogram of KS Tests", xlab="Mean", ylab="Density", freq=F)
<-seq(0.05,1,by=0.001)
xcurve(dnorm(x, mean=mean(y),sd=sd(y)), add=TRUE)
hist(z, main="Histogram of Shapiro Wilks Tests", xlab="Mean", ylab="Density", freq=F)
<-seq(0.05,1,by=0.001)
xcurve(dnorm(x, mean=mean(z),sd=sd(z)), add=TRUE)
hist(w, main="Histogram of Anderson Darling Tests", xlab="Mean", ylab="Density", freq=F)
<-seq(0.05,1,by=0.001)
xcurve(dnorm(x, mean=mean(w),sd=sd(w)), add=TRUE)
This is exactly the same result. You still reject 5% which is unexpected as you would hope that as the sample size increases and you are sampling from a normal distribution and as the sample size increases it should be more representative but the false negative rate remains the same.
I now want to look at the effect of small sample sizes. I will go down to a sample size of 8 which is the smallest that you can test.
<- vector()
y <- vector()
z <- vector()
w for (i in 1:10000){
<- rnorm(8, mean=0, sd=1)
sample <- lillie.test(sample)
temp <- temp$p.value
y[i] <- shapiro.test(sample)
temp <- temp$p.value
z[i] <- ad.test(sample)
temp <- temp$p.value
w[i]
}hist(y, main="Histogram of KS Tests", xlab="Mean", ylab="Density", freq=F)
<-seq(0.05,1,by=0.001)
xcurve(dnorm(x, mean=mean(y),sd=sd(y)), add=TRUE)
hist(z, main="Histogram of Shapiro Wilks Tests", xlab="Mean", ylab="Density", freq=F)
<-seq(0.05,1,by=0.001)
xcurve(dnorm(x, mean=mean(z),sd=sd(z)), add=TRUE)
hist(w, main="Histogram of Anderson Darling Tests", xlab="Mean", ylab="Density", freq=F)
<-seq(0.05,1,by=0.001)
xcurve(dnorm(x, mean=mean(w),sd=sd(w)), add=TRUE)
Again this is still exactly the same. Which is really odd as you would hope that more would be rejected.
This time I am going to sample from an exponential distriution. This is the least normal example of a distribution. It has no central maximum and it has a long tail.
<- vector()
y <- vector()
z <- vector()
w for (i in 1:10000){
<- rexp(10, rate=1)
sample <- lillie.test(sample)
temp <- temp$p.value
y[i] <- shapiro.test(sample)
temp <- temp$p.value
z[i] <- ad.test(sample)
temp <- temp$p.value
w[i]
}hist(y, main="Histogram of KS Tests", xlab="Mean", ylab="Density", freq=F)
<-seq(0.05,1,by=0.001)
xcurve(dnorm(x, mean=mean(y),sd=sd(y)), add=TRUE)
hist(z, main="Histogram of Shapiro Wilks Tests", xlab="Mean", ylab="Density", freq=F)
<-seq(0.05,1,by=0.001)
xcurve(dnorm(x, mean=mean(z),sd=sd(z)), add=TRUE)
hist(w, main="Histogram of Anderson Darling Tests", xlab="Mean", ylab="Density", freq=F)
<-seq(0.05,1,by=0.001)
xcurve(dnorm(x, mean=mean(w),sd=sd(w)), add=TRUE)
This shows that it will often be accepted as normal when it clearly isn’t. There is an appreciable density above the cut-off for the test of normality even though you would hope that most of the samples should be rejected.
The first simulation was for a sample size of 10 but if I now increase that to 40 as with the sample from the normal distribution does the rejection criteria improve?
<- vector()
y <- vector()
z <- vector()
w for (i in 1:10000){
<- rexp(40, rate=1)
sample <- lillie.test(sample)
temp <- temp$p.value
y[i] <- shapiro.test(sample)
temp <- temp$p.value
z[i] <- ad.test(sample)
temp <- temp$p.value
w[i]
}hist(y, main="Histogram of KS Tests", xlab="Mean", ylab="Density", freq=F)
<-seq(0.05,1,by=0.001)
xcurve(dnorm(x, mean=mean(y),sd=sd(y)), add=TRUE)
hist(z, main="Histogram of Shapiro Wilks Tests", xlab="Mean", ylab="Density", freq=F)
<-seq(0.05,1,by=0.001)
xcurve(dnorm(x, mean=mean(z),sd=sd(z)), add=TRUE)
hist(w, main="Histogram of Anderson Darling Tests", xlab="Mean", ylab="Density", freq=F)
<-seq(0.05,1,by=0.001)
xcurve(dnorm(x, mean=mean(w),sd=sd(w)), add=TRUE)
The exponential is a distribution a long way from the normal but we could try something a little more similar such as the gamma distribution.
<- vector()
y <- vector()
z <- vector()
w for (i in 1:10000){
<- rgamma(10, shape=1, rate=1)
sample <- lillie.test(sample)
temp <- temp$p.value
y[i] <- shapiro.test(sample)
temp <- temp$p.value
z[i] <- ad.test(sample)
temp <- temp$p.value
w[i]
}hist(y, main="Histogram of KS Tests", xlab="Mean", ylab="Density", freq=F)
<-seq(0.05,1,by=0.001)
xcurve(dnorm(x, mean=mean(y),sd=sd(y)), add=TRUE)
hist(z, main="Histogram of Shapiro Wilks Tests", xlab="Mean", ylab="Density", freq=F)
<-seq(0.05,1,by=0.001)
xcurve(dnorm(x, mean=mean(z),sd=sd(z)), add=TRUE)
hist(w, main="Histogram of Anderson Darling Tests", xlab="Mean", ylab="Density", freq=F)
<-seq(0.05,1,by=0.001)
xcurve(dnorm(x, mean=mean(w),sd=sd(w)), add=TRUE)
This produces the same issues as for the exponential when the sample size is 10 when a lot of samples will not be rejected when they should be.
Again this situation should improve as we go to larger samples sizes.
<- vector()
y <- vector()
z <- vector()
w for (i in 1:10000){
<- rgamma(40, shape=1, rate=1)
sample <- lillie.test(sample)
temp <- temp$p.value
y[i] <- shapiro.test(sample)
temp <- temp$p.value
z[i] <- ad.test(sample)
temp <- temp$p.value
w[i]
}hist(y, main="Histogram of KS Tests", xlab="Mean", ylab="Density", freq=F)
<-seq(0.05,1,by=0.001)
xcurve(dnorm(x, mean=mean(y),sd=sd(y)), add=TRUE)
hist(z, main="Histogram of Shapiro Wilks Tests", xlab="Mean", ylab="Density", freq=F)
<-seq(0.05,1,by=0.001)
xcurve(dnorm(x, mean=mean(z),sd=sd(z)), add=TRUE)
hist(w, main="Histogram of Anderson Darling Tests", xlab="Mean", ylab="Density", freq=F)
<-seq(0.05,1,by=0.001)
xcurve(dnorm(x, mean=mean(w),sd=sd(w)), add=TRUE)
That is much better for sample sizes of 40 it clearly shows that this gamma is not norma
The previous simulations for exponential and gamma distributions show that there is a sample size effect of the tests correctly assigning the data to not being normally distributed. This can be plotted by varying the sample size and taking the median of the p-values for the test distribution.
<- vector()
y <- vector()
z <- vector()
w <- vector()
s <- vector()
t <- vector()
u <- 8:40
x for (j in 1:33){
for (i in 1:10000){
<- rexp(x[j], rate=1)
sample <- lillie.test(sample)
temp <- temp$p.value
y[i] <- shapiro.test(sample)
temp <- temp$p.value
z[i] <- ad.test(sample)
temp <- temp$p.value
w[i]
}<- median(w)
u[j] <- median(z)
t[j] <- median(y)
s[j]
}
plot(x, s, ylab="Median p-value for KS Normality", xlab="Sample Size", main="Median p-value KS Test for Normality vs Sample Size")
plot(x, t, ylab="Median p-value for SW Normality", xlab="Sample Size", main="Median p-value SW Test for Normality vs Sample Size")
plot(x, u, ylab="Median p-value for AD Normality", xlab="Sample Size", main="Median p-value AD Test for Normality vs Sample Size")
The best performance seems to be from Shapiro-Wilks where the sample size for detecting non-normality is lower than for the other two tests. The KS test has the worst performance.
I have done the same for the gamma distribution and the results are the same. The KS test has the worst performance and Shapiro-Wilks the best performance.
<- vector()
y <- vector()
z <- vector()
w <- vector()
s <- vector()
t <- vector()
u <- 8:40
x for (j in 1:33){
for (i in 1:10000){
<- rgamma(x[j],shape=1, rate=1)
sample <- lillie.test(sample)
temp <- temp$p.value
y[i] <- shapiro.test(sample)
temp <- temp$p.value
z[i] <- ad.test(sample)
temp <- temp$p.value
w[i]
}<- median(w)
u[j] <- median(z)
t[j] <- median(y)
s[j]
}
plot(x, s, ylab="Median p-value for KS Normality", xlab="Sample Size", main="Median p-value KS Test for Normality vs Sample Size")
plot(x, t, ylab="Median p-value for SW Normality", xlab="Sample Size", main="Median p-value SW Test for Normality vs Sample Size")
plot(x, u, ylab="Median p-value for AD Normality", xlab="Sample Size", main="Median p-value AD Test for Normality vs Sample Size")