Testing For Normality in the “Real World” - Dealing with Skew
When I did my simulations of skewed data to see how well normality testing was performing in terms of classifying data depending on sample size I sampled the skew from a uniform distribution between -3 and 3. It would possibly be more realistic to sample from a normal distribution with a mean skew of 0 and a standard deviation of 1.5. This would mean that approximately 95% of the cases will have a skew between -3 and 3 and 68% will lie between -1.5 and 1.5 the threshold for treating the data as normal and not being sufficiently skewed.
The question is how does this affect the tests?
Small Sample Simulation with Shapiro-Wilk Test
library(fGarch)library(ggplot2)library(nortest)library(caret)set.seed(1)predicted <-vector()actual <-vector()for (i in1:10000){ xi<-rnorm(1,0,1.5) y <-shapiro.test(rsnorm(8,mean=168,sd=6.4, xi))if( xi <-1.5){actual[i]=0}elseif (xi >1){actual[i]=0}else{actual[i]=1}if(y$p.value <0.05){predicted[i]=0}else{predicted[i]=1}}predicted <-factor(predicted, levels =c(0,1))actual <-factor(actual, levels =c(0,1))confusionMatrix(data=predicted, reference = actual)
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 423 578
1 3787 5212
Accuracy : 0.5635
95% CI : (0.5537, 0.5733)
No Information Rate : 0.579
P-Value [Acc > NIR] : 0.9992
Kappa : 7e-04
Mcnemar's Test P-Value : <2e-16
Sensitivity : 0.1005
Specificity : 0.9002
Pos Pred Value : 0.4226
Neg Pred Value : 0.5792
Prevalence : 0.4210
Detection Rate : 0.0423
Detection Prevalence : 0.1001
Balanced Accuracy : 0.5003
'Positive' Class : 0
The sensitivity of the test is still very low but the specificity is high. The test classifies almost everything as normal and it is only in a small number of cases that it correctly detects that the data is not normal. This shows a very high type II error rate. Whereas the high specificity indicates a lower type I error rate.
Small Sample Simulation with Anderson Darling
library(fGarch)library(ggplot2)library(nortest)library(caret)predicted <-vector()actual <-vector()for (i in1:10000){ xi<-rnorm(1,0,1.5) y <-ad.test(rsnorm(8,168,6.4,xi))if( xi <-1.5){actual[i]=0}elseif (xi >1){actual[i]=0}else{actual[i]=1}if(y$p.value <0.05){predicted[i]=0}else{predicted[i]=1}}predicted <-factor(predicted, levels =c(0,1))actual <-factor(actual, levels =c(0,1))confusionMatrix(data=predicted, reference = actual)
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 383 574
1 3698 5345
Accuracy : 0.5728
95% CI : (0.563, 0.5825)
No Information Rate : 0.5919
P-Value [Acc > NIR] : 0.9999
Kappa : -0.0035
Mcnemar's Test P-Value : <2e-16
Sensitivity : 0.09385
Specificity : 0.90302
Pos Pred Value : 0.40021
Neg Pred Value : 0.59106
Prevalence : 0.40810
Detection Rate : 0.03830
Detection Prevalence : 0.09570
Balanced Accuracy : 0.49844
'Positive' Class : 0
The results for Anderson-Darling are very similar to those from Shapiro-Wilks
Small Sample Simulation with Lilliefors Test (Kolmogorov-Smirnov)
library(fGarch)library(ggplot2)library(nortest)library(caret)predicted <-vector()actual <-vector()for (i in1:10000){ xi<-rnorm(1,0,1.5) y <-lillie.test(rsnorm(8,168,6.4,xi))if( xi <-1.5){actual[i]=0}elseif (xi >1){actual[i]=0}else{actual[i]=1}if(y$p.value <0.05){predicted[i]=0}else{predicted[i]=1}}predicted <-factor(predicted, levels =c(0,1))actual <-factor(actual, levels =c(0,1))confusionMatrix(data=predicted, reference = actual)
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 331 455
1 3730 5484
Accuracy : 0.5815
95% CI : (0.5718, 0.5912)
No Information Rate : 0.5939
P-Value [Acc > NIR] : 0.9943
Kappa : 0.0056
Mcnemar's Test P-Value : <2e-16
Sensitivity : 0.08151
Specificity : 0.92339
Pos Pred Value : 0.42112
Neg Pred Value : 0.59518
Prevalence : 0.40610
Detection Rate : 0.03310
Detection Prevalence : 0.07860
Balanced Accuracy : 0.50245
'Positive' Class : 0
For the Lilliefors test (equivalent to the Kolmogorov-Smirmov test) the specificity is even higher but at the cost of sensitivity.
Medium Sample Size Simulation with Shapiro Wilk Test
library(fGarch)library(ggplot2)library(nortest)library(caret)set.seed(1)predicted <-vector()actual <-vector()for (i in1:10000){ xi<-rnorm(1,0,1.5) y <-shapiro.test(rsnorm(30,mean=168,sd=6.4, xi))if( xi <-1.5){actual[i]=0}elseif (xi >1){actual[i]=0}else{actual[i]=1}if(y$p.value <0.05){predicted[i]=0}else{predicted[i]=1}}predicted <-factor(predicted, levels =c(0,1))actual <-factor(actual, levels =c(0,1))confusionMatrix(data=predicted, reference = actual)
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 1410 2039
1 2731 3820
Accuracy : 0.523
95% CI : (0.5132, 0.5328)
No Information Rate : 0.5859
P-Value [Acc > NIR] : 1
Kappa : -0.0077
Mcnemar's Test P-Value : <2e-16
Sensitivity : 0.3405
Specificity : 0.6520
Pos Pred Value : 0.4088
Neg Pred Value : 0.5831
Prevalence : 0.4141
Detection Rate : 0.1410
Detection Prevalence : 0.3449
Balanced Accuracy : 0.4962
'Positive' Class : 0
There is a considerable improvement in sensitivity with a moderate loss of specificity. The prediction accuracy is still only about 57% which is not very good.
Medium Sample Size Simulation with Kolmogorov-Smirnov Test
library(fGarch)library(ggplot2)library(nortest)library(caret)predicted <-vector()actual <-vector()for (i in1:10000){ xi<-rnorm(1,0,1.5) y <-lillie.test(rsnorm(30,168,6.4,xi))if( xi <-1.5){actual[i]=0}elseif (xi >1){actual[i]=0}else{actual[i]=1}if(y$p.value <0.05){predicted[i]=0}else{predicted[i]=1}}predicted <-factor(predicted, levels =c(0,1))actual <-factor(actual, levels =c(0,1))confusionMatrix(data=predicted, reference = actual)
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 842 1153
1 3288 4717
Accuracy : 0.5559
95% CI : (0.5461, 0.5657)
No Information Rate : 0.587
P-Value [Acc > NIR] : 1
Kappa : 0.0081
Mcnemar's Test P-Value : <2e-16
Sensitivity : 0.2039
Specificity : 0.8036
Pos Pred Value : 0.4221
Neg Pred Value : 0.5893
Prevalence : 0.4130
Detection Rate : 0.0842
Detection Prevalence : 0.1995
Balanced Accuracy : 0.5037
'Positive' Class : 0
This is worse than the Shapiro-Wilks results with a much smaller increase in sensitivity but also a smaller loss of specificity.
Large Sample Size Simulation with Shapiro Wilk Test
library(fGarch)library(ggplot2)library(nortest)library(caret)set.seed(1)predicted <-vector()actual <-vector()for (i in1:10000){ xi<-rnorm(1,0,1.5) y <-shapiro.test(rsnorm(500,mean=168,sd=6.4, xi))if( xi <-1.5){actual[i]=0}elseif (xi >1){actual[i]=0}else{actual[i]=1}if(y$p.value <0.05){predicted[i]=0}else{predicted[i]=1}}predicted <-factor(predicted, levels =c(0,1))actual <-factor(actual, levels =c(0,1))confusionMatrix(data=predicted, reference = actual)
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 3686 5022
1 323 969
Accuracy : 0.4655
95% CI : (0.4557, 0.4753)
No Information Rate : 0.5991
P-Value [Acc > NIR] : 1
Kappa : 0.068
Mcnemar's Test P-Value : <2e-16
Sensitivity : 0.9194
Specificity : 0.1617
Pos Pred Value : 0.4233
Neg Pred Value : 0.7500
Prevalence : 0.4009
Detection Rate : 0.3686
Detection Prevalence : 0.8708
Balanced Accuracy : 0.5406
'Positive' Class : 0
The sensitivity has improved to perfection. There is a 0 type II error rate as the test can now identify that all skewed data is not normal. However this comes at a cost of a large fall in specificity in that now lots of normal data is also being rejected and classified as not normal.
Test accuracy still remains low with the best possible case of around 65%.
Large Sample Size Simulation with Kolmogorov-Smirnov Test
library(fGarch)library(ggplot2)library(nortest)library(caret)predicted <-vector()actual <-vector()for (i in1:10000){ xi<-rnorm(1,0,1.5) y <-lillie.test(rsnorm(500,168,6.4,xi))if( xi <-1.5){actual[i]=0}elseif (xi >1){actual[i]=0}else{actual[i]=1}if(y$p.value <0.05){predicted[i]=0}else{predicted[i]=1}}predicted <-factor(predicted, levels =c(0,1))actual <-factor(actual, levels =c(0,1))confusionMatrix(data=predicted, reference = actual)
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 3664 4716
1 442 1178
Accuracy : 0.4842
95% CI : (0.4744, 0.494)
No Information Rate : 0.5894
P-Value [Acc > NIR] : 1
Kappa : 0.0796
Mcnemar's Test P-Value : <2e-16
Sensitivity : 0.8924
Specificity : 0.1999
Pos Pred Value : 0.4372
Neg Pred Value : 0.7272
Prevalence : 0.4106
Detection Rate : 0.3664
Detection Prevalence : 0.8380
Balanced Accuracy : 0.5461
'Positive' Class : 0
Again the Kolmogorov-Smirnov test outperforms the Shapiro Wilk test. While it does not have 100% specificity and some skewed data is classified as normal, this is a very small number. It has a better specificity and can pick out normally distributed data better and has a better prediction accuracy.
Conclusion
There is very little difference between the results for the normally distributed and the uniformly distributed skew distributions and of anything for large samples there is a slightly worse performance in terms of specificity for the normally distributed skew. This still raises questions about the use of tests for normality as a yes/no binary before starting subsequent statistical analysis and either applying parametric or non-parametric methods. Testing for normality should be approached with caution.
One way of resolving this issue for the skewed data cases is determining the effect of skew on the subsequent parametric NHST. How much does skew affect t-tests and ANOVA? There is some existing literature regarding these effects but they are largely uncited because we have become stuck with the normal/non-normal dichotomy and not the reality that normality can be a continuum in terms of skew.