Objectives

The data below represents the number of driving errors made by drivers who did or did not attend driving school, under clear, rainy, or snowy conditions, and at day or night. Testing was also performed on Saturdays and Sundays.

Determine if there is any effect of driving school and whether its effect is influenced by weather conditions or lighting conditions. Be sure to include a complete analysis (i.e., all assumptions checked and full understanding of effects garnered) and a clear summary with all needed statistical details included (see lab keys for examples).

Loading the “moments” package

library(moments)
library(reshape2)
library(car)
## Loading required package: carData
library(sjstats)

Loading the dataset

library(readxl)
Driving <- read_excel("/Users/anacapa/sunny/mera/harrisburg/Term 3/ANLY 510 Analytics II/Exams/Exam 1/Driving.xlsx")
Driving
## # A tibble: 48 x 5
##    attendeddrivingscho… clearorrainorsn… dayornight dayoftesting errorsmade
##                   <dbl>            <dbl>      <dbl>        <dbl>      <dbl>
##  1                    1                1          1            1          4
##  2                    1                1          1            2         18
##  3                    1                1          1            1          8
##  4                    1                1          1            2         10
##  5                    2                1          1            1          6
##  6                    2                1          1            2          4
##  7                    2                1          1            1         13
##  8                    2                1          1            2          7
##  9                    1                1          2            1         21
## 10                    1                1          2            2         14
## # … with 38 more rows

From the description, we know that we will have a 3 way factorial ANOVA with a blocking effect. The first step is to factor our categorical predictors:

Driving$attendeddrivingschool <- factor(Driving$attendeddrivingschool)
Driving$clearorrainorsnow <- factor(Driving$clearorrainorsnow)
Driving$dayornight <- factor(Driving$dayornight)
Driving$dayoftesting <- factor(Driving$dayoftesting)

Now lets see how the distribution looks:

agostino.test(Driving$errorsmade)
## 
##  D'Agostino skewness test
## 
## data:  Driving$errorsmade
## skew = 0.94044, z = 2.65560, p-value = 0.007916
## alternative hypothesis: data have a skewness
anscombe.test(Driving$errorsmade)
## 
##  Anscombe-Glynn kurtosis test
## 
## data:  Driving$errorsmade
## kurt = 3.6416, z = 1.2798, p-value = 0.2006
## alternative hypothesis: kurtosis is not equal to 3
plot(density(Driving$errorsmade))

There is some positive skew. We can fix it by taking the log after adding 1:

agostino.test(log(Driving$errorsmade+1))
## 
##  D'Agostino skewness test
## 
## data:  log(Driving$errorsmade + 1)
## skew = -0.49358, z = -1.50500, p-value = 0.1323
## alternative hypothesis: data have a skewness
anscombe.test(log(Driving$errorsmade+1))
## 
##  Anscombe-Glynn kurtosis test
## 
## data:  log(Driving$errorsmade + 1)
## kurt = 3.1002, z = 0.5853, p-value = 0.5583
## alternative hypothesis: kurtosis is not equal to 3

alternative hypothesis: kurtosis is not equal to 3

Driving$e2 <- log(Driving$errorsmade+1)

Now we want to make sure we have balance:

table(Driving$attendeddrivingschool)
## 
##  1  2 
## 24 24
table(Driving$clearorrainorsnow)
## 
##  1  2  3 
## 16 16 16
table(Driving$dayornight)
## 
##  1  2 
## 24 24
table(Driving$dayoftesting)
## 
##  1  2 
## 24 24
table(Driving$clearorrainorsnow, Driving$attendeddrivingschool)
##    
##     1 2
##   1 8 8
##   2 8 8
##   3 8 8
table(Driving$clearorrainorsnow, Driving$dayornight)
##    
##     1 2
##   1 8 8
##   2 8 8
##   3 8 8
table(Driving$clearorrainorsnow, Driving$dayoftesting)
##    
##     1 2
##   1 8 8
##   2 8 8
##   3 8 8

Looks good, now we need to check equality of variances:

bartlett.test(Driving$e2, Driving$attendeddrivingschool)
## 
##  Bartlett test of homogeneity of variances
## 
## data:  Driving$e2 and Driving$attendeddrivingschool
## Bartlett's K-squared = 0.00037237, df = 1, p-value = 0.9846
bartlett.test(Driving$e2, Driving$clearorrainorsnow)
## 
##  Bartlett test of homogeneity of variances
## 
## data:  Driving$e2 and Driving$clearorrainorsnow
## Bartlett's K-squared = 1.1835, df = 2, p-value = 0.5534
bartlett.test(Driving$e2, Driving$dayornight)
## 
##  Bartlett test of homogeneity of variances
## 
## data:  Driving$e2 and Driving$dayornight
## Bartlett's K-squared = 0.42492, df = 1, p-value = 0.5145
bartlett.test(Driving$e2, Driving$attendeddrivingschool:Driving$clearorrainorsnow)
## 
##  Bartlett test of homogeneity of variances
## 
## data:  Driving$e2 and Driving$attendeddrivingschool:Driving$clearorrainorsnow
## Bartlett's K-squared = 5.5453, df = 5, p-value = 0.353
bartlett.test(Driving$e2, Driving$attendeddrivingschool:Driving$dayornight)
## 
##  Bartlett test of homogeneity of variances
## 
## data:  Driving$e2 and Driving$attendeddrivingschool:Driving$dayornight
## Bartlett's K-squared = 3.1143, df = 3, p-value = 0.3743
bartlett.test(Driving$e2, Driving$clearorrainorsnow:Driving$dayornight)
## 
##  Bartlett test of homogeneity of variances
## 
## data:  Driving$e2 and Driving$clearorrainorsnow:Driving$dayornight
## Bartlett's K-squared = 4.0185, df = 5, p-value = 0.5468
bartlett.test(Driving$e2, Driving$clearorrainorsnow:Driving$dayornight:Driving$attendeddrivingschool)
## 
##  Bartlett test of homogeneity of variances
## 
## data:  Driving$e2 and Driving$clearorrainorsnow:Driving$dayornight:Driving$attendeddrivingschool
## Bartlett's K-squared = 9.4212, df = 11, p-value = 0.5831

No issues there. Lets see if we have an interaction(s) with our blocking variable (day of testing)

inter <- aov(e2 ~ attendeddrivingschool*clearorrainorsnow*dayornight*dayoftesting, data = Driving)
inter
## Call:
##    aov(formula = e2 ~ attendeddrivingschool * clearorrainorsnow * 
##     dayornight * dayoftesting, data = Driving)
## 
## Terms:
##                 attendeddrivingschool clearorrainorsnow dayornight
## Sum of Squares               4.136503          3.426500   3.146116
## Deg. of Freedom                     1                 2          1
##                 dayoftesting attendeddrivingschool:clearorrainorsnow
## Sum of Squares      0.006506                                0.190811
## Deg. of Freedom            1                                       2
##                 attendeddrivingschool:dayornight
## Sum of Squares                          0.110096
## Deg. of Freedom                                1
##                 clearorrainorsnow:dayornight
## Sum of Squares                      0.066726
## Deg. of Freedom                            2
##                 attendeddrivingschool:dayoftesting
## Sum of Squares                            0.010316
## Deg. of Freedom                                  1
##                 clearorrainorsnow:dayoftesting dayornight:dayoftesting
## Sum of Squares                        0.068672                0.113676
## Deg. of Freedom                              2                       1
##                 attendeddrivingschool:clearorrainorsnow:dayornight
## Sum of Squares                                            0.572834
## Deg. of Freedom                                                  2
##                 attendeddrivingschool:clearorrainorsnow:dayoftesting
## Sum of Squares                                              0.392388
## Deg. of Freedom                                                    2
##                 attendeddrivingschool:dayornight:dayoftesting
## Sum of Squares                                       0.012338
## Deg. of Freedom                                             1
##                 clearorrainorsnow:dayornight:dayoftesting
## Sum of Squares                                   0.027212
## Deg. of Freedom                                         2
##                 attendeddrivingschool:clearorrainorsnow:dayornight:dayoftesting
## Sum of Squares                                                         1.038938
## Deg. of Freedom                                                               2
##                 Residuals
## Sum of Squares   3.161637
## Deg. of Freedom        24
## 
## Residual standard error: 0.362953
## Estimated effects may be unbalanced

There is a 4 way interaction with our blocking factor. Without getting into the details of this interaction, we will stick with this fuller model as a result. We get the means, effect sizes (optional), and also look at the normality of the residuals, and the effect of weather:

tapply(Driving$errorsmade, Driving$attendeddrivingschool, mean)
##        1        2 
## 22.50000 12.08333
tapply(Driving$errorsmade, Driving$attendeddrivingschool, sd)
##         1         2 
## 10.673005  5.919141
tapply(Driving$errorsmade, Driving$clearorrainorsnow, mean)
##      1      2      3 
## 11.875 16.875 23.125
tapply(Driving$errorsmade, Driving$clearorrainorsnow, sd)
##         1         2         3 
##  6.601767  8.593602 11.401023
tapply(Driving$errorsmade, Driving$dayornight, mean)
##        1        2 
## 12.91667 21.66667
tapply(Driving$errorsmade, Driving$dayornight, sd)
##         1         2 
##  6.820153 10.913361
qqnorm(inter$residuals)
qqline(inter$residuals)

shapiro.test(inter$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  inter$residuals
## W = 0.97511, p-value = 0.3947
TukeyHSD(inter, "clearorrainorsnow")
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = e2 ~ attendeddrivingschool * clearorrainorsnow * dayornight * dayoftesting, data = Driving)
## 
## $clearorrainorsnow
##          diff          lwr       upr     p adj
## 2-1 0.3114712 -0.008988828 0.6319312 0.0578871
## 3-1 0.6542069  0.333746896 0.9746669 0.0000932
## 3-2 0.3427357  0.022275710 0.6631957 0.0344545

From the above, we see that clear has less errors than snow, marginally less than rain, and rain has less than snow.

Summary

we corrected the positive skew by taking a log to tansform the variable. Then, we analyzed the effects of driving school, weather conditions (clear, rain, or snow), time of day (day or night), and day of testing on the number of driving errors made in a factorial ANOVA. From the ANOVA test, we can see that those who attended driving school (Munadjusted = 12.08; SD = 5.92) made less errors than those who did not (Munadjusted = 22.5; SD = 10.67), F(1, 24) = 31.4, p < .001. In addition, more errors were made in the evening (Munadjusted = 21.67; SD = 10.91) than during the day (Munadjusted = 12.92; SD = 6.82), F(1, 24) = 23.88, p < .001. Weather conditions also affected errors, F(2, 24) = 13.01, p < .001. Post-hoc analysis with Tukey correction revealed that less errors were made on clear days (Munadjusted = 11.88; SD = 6.6) than snowy days (Munadjusted = 23.13; SD = 11.4; p < .001), and marginally less than rainy days (Munadjusted = 16.88; SD = 8.59; p = .06); rainy days also had less errors than snowy days, p = .03. In short, we find that those who attend driving school make less errors on average, and also find that has driving conditions become harder (less light or weather hazards) more errors are made.