DOE Midterm Question 6

Reading in data and required libraries:

zombies <- read.csv(file.choose())
library(dplyr)

## Warning: package 'dplyr' was built under R version 4.1.3

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(tidyr)

## Warning: package 'tidyr' was built under R version 4.1.3

Answer to question no 6(a)

Boxplot:

zombies_nogame <- zombies %>% select(Basic, Conehead, Buckethead)
boxplot(zombies_nogame, main = "Zombie Kills", xlab = "zombie type", ylab = "zombie kills")

From the preliminary observations, we can see that the variance are fairly equal among different zombie groups. The means of the Conehead kills are the highest among 3 different types.

Answer to question no 6(b)

If, u1 = mean kills of Basic zombies

u2 = mean kills of Conehead zombies

u3 = mean kills of Buckethead zombies

Null hypothesis, Ho: u1 = u2 =u3

Alternative hypothesis, Ha: Atleast one of the means differ

ANOVA test:

zombieslong <- zombies %>% select(Basic, Conehead, Buckethead)
zombieslong <- pivot_longer(data = zombieslong, c("Basic", "Conehead", "Buckethead"))
zombieslong$name <- as.factor(zombieslong$name)

anovazombies <- aov(value~name, data=zombieslong)
summary(anovazombies)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## name         2  396.4  198.20    73.8 1.79e-14 ***
## Residuals   42  112.8    2.69                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

From the results of ANOVA, we see that our p-value (1.79*10^-14) is significantly smaller than our threshold alpha ( = 0.05). Hence, we reject the null hypothesis that the mean kills of all the zombie types are the same.

Answer to question no 6(c)

plot(anovazombies)

From the residuals vs Fitted values, we see that the spread of the vars are relatively similar following a rectangular-shaped trend. This indicates that the variance of the residuals are equal indicating that the strong assumption (equal variance) part of ANOVA is satisfied.

Fairly linear trend of the normal qq plot of the residuals indicate that the data have a fair amount of normal distribution. Although not fully normal to the ideal extent, but ANOVA is robust in minor violation of the normality assumption.

Thus, we can conclude that our test of ANOVA is adequate.

Answer to question no 6(d)

Using Tukey’s HSD, we find the pairs that significantly differ from each other:

plot(TukeyHSD(anovazombies))

From the 95% family-wise confidence levels, we see that none of the comparison intervals have the 0 line passing through it. Hence, all the comparisons (1)Basic-Cornerhead (2) Basic-Buckethead (3) Cornerhead-Buckethead are significantly different in their family-wise comparisons. In other words, all the groups differ significantly from each other as per the Tukey’s HSD post-hoc test result

(TukeyHSD(anovazombies))

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = value ~ name, data = zombieslong)
## 
## $name
##                     diff       lwr        upr     p adj
## Buckethead-Basic    -1.8 -3.253835 -0.3461652 0.0120723
## Conehead-Basic       5.2  3.746165  6.6538348 0.0000000
## Conehead-Buckethead  7.0  5.546165  8.4538348 0.0000000

Also from the summary of the Tukey HSD, we see that the adjusted p is significantly different for each pairs. Hence they all are different from each other.

Complete code chunk:

zombies <- read.csv(file.choose())
View(zombies)
zombies_nogame <- zombies %>% select(Basic, Conehead, Buckethead)
boxplot(zombies_nogame, main = "Zombie Kills", xlab = "zombie type", ylab = "zombie kills")

library(dplyr)
library(tidyr)

names(zombies)

zombieslong <- zombies %>% select(Basic, Conehead, Buckethead)
View(zombieslong)
zombieslong <- pivot_longer(data = zombieslong, c("Basic", "Conehead", "Buckethead"))
str(zombieslong)
zombieslong$name <- as.factor(zombieslong$name)

anovazombies <- aov(value~name, data=zombieslong)
summary(anovazombies)
plot(anovazombies)

plot(TukeyHSD(anovazombies))