ANOVA is a statistical method for analysing the variance in a study. It’s used to look at variations in the dependent variable’s mean values that are linked to the influence of independent variables. ANOVA is a method for comparing the means of two or more winners.
ANOVA can be devided into two parts, one is One way and second one is Two-way ANOVA
The one-way ANOVA contrasts the means of the categories that interested in to see if all of them are statistically substantially different from one another. It examines the null hypothesis.
The mean of a quantitative variable is estimated using a two-way ANOVA based on the levels of two categorical variables.
$$
H0 : μ1 = μ2 =μ3 = …..= μk
HA : μi ≠μj for some i and j
$$
Where k = number of groups and μ = group mean. If, on the other hand, the one-way ANOVA provides a statistically significant finding, we support the alternate hypothesis (HA), which states that there are at least two statistically significant group means.
I have used another set of data for one way ANOVA
head(data)
## Name Fish Doll Toy Others
## 1 John 29 47 16 25
## 2 Duke 29 46 16 26
## 3 Chris 29 46 16 24
## 4 Charles 28 46 16 27
## 5 Narin 28 46 16 27
## 6 David 28 46 16 21
Lets add above values into R
data = data.frame("A" = c(29,47,16,25), "B" = c(29,46,16,26), "C" = c(29,46,16,24), "D" = c(28,46,16,25),"E" = c(28,46,16,27),"Gift"=1:4)
data
## A B C D E Gift
## 1 29 29 29 28 28 1
## 2 47 46 46 46 46 2
## 3 16 16 16 16 16 3
## 4 25 26 24 25 27 4
I have organized this information. It took me a long time to figure out the right code, but I finally did so here. By the way, I don’t believe I was required to include the plant number, but I did.
library(readr)
test <-
data %>%
pivot_longer(c('A','B','C','D','E'), names_to = "Doll", values_to = "Others")
test
## # A tibble: 20 x 3
## Gift Doll Others
## <int> <chr> <dbl>
## 1 1 A 29
## 2 1 B 29
## 3 1 C 29
## 4 1 D 28
## 5 1 E 28
## 6 2 A 47
## 7 2 B 46
## 8 2 C 46
## 9 2 D 46
## 10 2 E 46
## 11 3 A 16
## 12 3 B 16
## 13 3 C 16
## 14 3 D 16
## 15 3 E 16
## 16 4 A 25
## 17 4 B 26
## 18 4 C 24
## 19 4 D 25
## 20 4 E 27
Compute Df Sum Sq Mean Sq F value Pr(>F):
fm <- aov(Others ~ Doll, test)
summary(fm)
## Df Sum Sq Mean Sq F value Pr(>F)
## Doll 4 1.2 0.3 0.002 1
## Residuals 15 2395.7 159.7
Display the data within the table (Stadium Name and Total goal count).
Chi-squared test using one categorical variables
chisq.test(data$A)
##
## Chi-squared test for given probabilities
##
## data: data$A
## X-squared = 17.393, df = 3, p-value = 0.0005866
Here alpha is less than 0.5, hence unable to reject my null hypothesis.
Visualize the data with ggplot
library(ggmosaic)
library(ggplot2)
ggplot(test , aes(x = Doll, y = Others)) +
geom_boxplot()
Check the homogeneity of variance assumption. The residuals versus fits plot can be used to check the homogeneity of variances.
plot(fm, 1:2)
In the above graph displayed a relationship. Some points are fall on that line. All are close enough to continue with our results.
For more visualization I have expressed the differences in these means.
y1 <- mean(data$Doll, na.rm = TRUE)
## Warning in mean.default(data$Doll, na.rm = TRUE): argument is not numeric or
## logical: returning NA
y2 <- mean(data$C, na.rm = TRUE)
ggplot(test , aes(x = Doll, y = Others)) +
geom_point() + geom_jitter(color = 'grey') +
stat_summary(fun.data = 'mean_se',color = "magenta") +
geom_hline(yintercept = y2, color ="blue", linetype = "dashed")
I have used previous assignments data Using two categorical variables preform a test for independence.
summary(data1)
## date_GMT referee total_goal_count
## Length:380 Length:380 Min. :0.000
## Class :character Class :character 1st Qu.:2.000
## Mode :character Mode :character Median :3.000
## Mean :2.821
## 3rd Qu.:4.000
## Max. :8.000
## total_goals_at_half_time total_minute stadium_name
## Min. :0.000 Min. :90 Length:380
## 1st Qu.:0.000 1st Qu.:90 Class :character
## Median :1.000 Median :90 Mode :character
## Mean :1.253 Mean :90
## 3rd Qu.:2.000 3rd Qu.:90
## Max. :6.000 Max. :90
Display the data within the table (Stadium Name and Total goal count).
table(data1$referee, data1$total_minute)
##
## 90
## Andre Marriner 27
## Andy Madley 2
## Anthony Taylor 32
## Chris Kavanagh 24
## Craig Pawson 26
## David Coote 11
## Graham Scott 17
## Jonathan Moss 27
## Kevin Friend 27
## Lee Mason 19
## Lee Probert 18
## Martin Atkinson 29
## Michael Oliver 30
## Mike Dean 29
## Paul Tierney 24
## Roger East 10
## Simon Hooper 8
## Stuart Attwell 20
Chi-squared test using two categorical variables
chisq.test(table(data1$referee, data1$total_minute))
##
## Chi-squared test for given probabilities
##
## data: table(data1$referee, data1$total_minute)
## X-squared = 59.768, df = 17, p-value = 1.148e-06
Here alpha is greater than 0.5, hence it reject my null hypothesis here.
Visualize the table data into ggplot.
y1 <- mean(data1$total_goal_count, na.rm = TRUE)
ggplot(data1, aes(x = referee, y = total_minute))+
geom_jitter(color = 'grey') +
stat_summary(fun.data1 = 'mean_se', color = "red") +
geom_hline(yintercept = y1, color = "blue",linetype = "dashed")
## Warning: Ignoring unknown parameters: fun.data1
## No summary function supplied, defaulting to `mean_se()`
Time to run the ANOVA
twoWayAnova <- aov(total_goal_count ~ total_goals_at_half_time*date_GMT, data = data1)
summary(twoWayAnova)
## Df Sum Sq Mean Sq F value Pr(>F)
## total_goals_at_half_time 1 429.0 429.0 255.154 <2e-16 ***
## date_GMT 211 261.0 1.2 0.736 0.970
## total_goals_at_half_time:date_GMT 57 96.8 1.7 1.010 0.473
## Residuals 110 185.0 1.7
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
So here we examined if cut, color and the interaction between the two will have an effect on the tournament.
plot(twoWayAnova, 1:5)
## Warning: not plotting observations with leverage one:
## 2, 5, 7, 8, 9, 10, 11, 16, 17, 18, 19, 20, 21, 23, 26, 27, 28, 29, 30, 31, 37, 38, 39, 40, 41, 47, 48, 49, 50, 51, 58, 59, 60, 69, 70, 71, 77, 78, 79, 80, 81, 88, 89, 90, 96, 99, 100, 101, 106, 107, 108, 109, 110, 111, 116, 117, 118, 127, 128, 129, 130, 131, 137, 138, 139, 140, 143, 144, 149, 150, 151, 157, 158, 159, 160, 161, 167, 168, 169, 170, 171, 172, 179, 180, 181, 188, 189, 190, 196, 197, 198, 199, 200, 201, 202, 203, 205, 208, 209, 210, 211, 217, 218, 219, 220, 221, 228, 229, 230, 235, 236, 237, 238, 239, 240, 241, 247, 248, 249, 250, 251, 252, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 271, 273, 274, 275, 280, 286, 287, 288, 289, 290, 294, 296, 297, 298, 299, 303, 304, 305, 311, 312, 313, 314, 320, 324, 325, 326, 327, 332, 333, 334, 335, 336, 337, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370
## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced
## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced