zombies <- read.csv(file.choose())
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.1.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
## Warning: package 'tidyr' was built under R version 4.1.3
Boxplot:
zombies_nogame <- zombies %>% select(Basic, Conehead, Buckethead)
boxplot(zombies_nogame, main = "Zombie Kills", xlab = "zombie type", ylab = "zombie kills")
From the preliminary observations, we can see that the variance are fairly equal among different zombie groups. The means of the Conehead kills are the highest among 3 different types.
If, u1 = mean kills of Basic zombies
u2 = mean kills of Conehead zombies
u3 = mean kills of Buckethead zombies
Null hypothesis, Ho: u1 = u2 =u3
Alternative hypothesis, Ha: Atleast one of the means differ
ANOVA test:
zombieslong <- zombies %>% select(Basic, Conehead, Buckethead)
zombieslong <- pivot_longer(data = zombieslong, c("Basic", "Conehead", "Buckethead"))
zombieslong$name <- as.factor(zombieslong$name)
anovazombies <- aov(value~name, data=zombieslong)
summary(anovazombies)
## Df Sum Sq Mean Sq F value Pr(>F)
## name 2 396.4 198.20 73.8 1.79e-14 ***
## Residuals 42 112.8 2.69
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
From the results of ANOVA, we see that our p-value (1.79*10^-14) is significantly smaller than our threshold alpha ( = 0.05). Hence, we reject the null hypothesis that the mean kills of all the zombie types are the same.
plot(anovazombies)
From the residuals vs Fitted values, we see that the spread of the vars are relatively similar following a rectangular-shaped trend. This indicates that the variance of the residuals are equal indicating that the strong assumption (equal variance) part of ANOVA is satisfied.
Fairly linear trend of the normal qq plot of the residuals indicate that the data have a fair amount of normal distribution. Although not fully normal to the ideal extent, but ANOVA is robust in minor violation of the normality assumption.
Thus, we can conclude that our test of ANOVA is adequate.
Using Tukey’s HSD, we find the pairs that significantly differ from each other:
plot(TukeyHSD(anovazombies))
From the 95% family-wise confidence levels, we see that none of the comparison intervals have the 0 line passing through it. Hence, all the comparisons (1)Basic-Cornerhead (2) Basic-Buckethead (3) Cornerhead-Buckethead are significantly different in their family-wise comparisons. In other words, all the groups differ significantly from each other as per the Tukey’s HSD post-hoc test result
(TukeyHSD(anovazombies))
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = value ~ name, data = zombieslong)
##
## $name
## diff lwr upr p adj
## Buckethead-Basic -1.8 -3.253835 -0.3461652 0.0120723
## Conehead-Basic 5.2 3.746165 6.6538348 0.0000000
## Conehead-Buckethead 7.0 5.546165 8.4538348 0.0000000
Also from the summary of the Tukey HSD, we see that the adjusted p is significantly different for each pairs. Hence they all are different from each other.
zombies <- read.csv(file.choose())
View(zombies)
zombies_nogame <- zombies %>% select(Basic, Conehead, Buckethead)
boxplot(zombies_nogame, main = "Zombie Kills", xlab = "zombie type", ylab = "zombie kills")
library(dplyr)
library(tidyr)
names(zombies)
zombieslong <- zombies %>% select(Basic, Conehead, Buckethead)
View(zombieslong)
zombieslong <- pivot_longer(data = zombieslong, c("Basic", "Conehead", "Buckethead"))
str(zombieslong)
zombieslong$name <- as.factor(zombieslong$name)
anovazombies <- aov(value~name, data=zombieslong)
summary(anovazombies)
plot(anovazombies)
plot(TukeyHSD(anovazombies))