ANOVA

ANOVA is a statistical method for analysing the variance in a study. It’s used to look at variations in the dependent variable’s mean values that are linked to the influence of independent variables. ANOVA is a method for comparing the means of two or more winners.

ANOVA can be devided into two parts, one is One way and second one is Two-way ANOVA

The one-way ANOVA contrasts the means of the categories that interested in to see if all of them are statistically substantially different from one another. It examines the null hypothesis.

The mean of a quantitative variable is estimated using a two-way ANOVA based on the levels of two categorical variables.

$$

H0 : μ1 = μ2 =μ3 = …..= μk

HA : μi ≠ μj for some i and j

$$

Where k = number of groups and μ = group mean.  If, on the other hand, the one-way ANOVA provides a statistically significant finding, we support the alternate hypothesis (HA), which states that there are at least two statistically significant group means.

One Way Anova

head(data)
##      Name Fish Doll Toy Others
## 1    John   29   47  16     25
## 2    Duke   29   46  16     26
## 3   Chris   29   46  16     24
## 4 Charles   28   46  16     25
## 5   Narin   28   46  16     27
## 6   David   28   46  16     21

Lets add above values into R

data = data.frame("A" = c(29,47,16,25), "B" = c(29,46,16,26), "C" = c(29,46,16,24), "D" = c(28,46,16,25),"E" = c(28,46,16,27),"Gift"=1:4)
data
##    A  B  C  D  E Gift
## 1 29 29 29 28 28    1
## 2 47 46 46 46 46    2
## 3 16 16 16 16 16    3
## 4 25 26 24 25 27    4

I have organized this information. It took me a long time to figure out the right code, but I finally did so here. By the way, I don’t believe I was required to include the plant number, but I did.

library(tidyr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(readr)
test <-
  data %>% 
  pivot_longer(c('A','B','C','D','E'), names_to = "Doll", values_to = "Others")
test
## # A tibble: 20 x 3
##     Gift Doll  Others
##    <int> <chr>  <dbl>
##  1     1 A         29
##  2     1 B         29
##  3     1 C         29
##  4     1 D         28
##  5     1 E         28
##  6     2 A         47
##  7     2 B         46
##  8     2 C         46
##  9     2 D         46
## 10     2 E         46
## 11     3 A         16
## 12     3 B         16
## 13     3 C         16
## 14     3 D         16
## 15     3 E         16
## 16     4 A         25
## 17     4 B         26
## 18     4 C         24
## 19     4 D         25
## 20     4 E         27

Compute Df Sum Sq Mean Sq F value Pr(>F):

fm <- aov(Others ~ Doll, test)
summary(fm)
##             Df Sum Sq Mean Sq F value Pr(>F)
## Doll         4    1.2     0.3   0.002      1
## Residuals   15 2395.7   159.7

Visualize the data with ggplot

library(ggpubr)
## Loading required package: ggplot2
library(ggmosaic)
library(ggplot2)
ggplot(test , aes(x = Doll, y = Others)) +
  geom_boxplot()

Check the homogeneity of variance assumption. The residuals versus fits plot can be used to check the homogeneity of variances.

plot(fm, 1:2)

In the above graph displayed a relationship. Some points are fall on that line. All are close enough to continue with our results.

For more visualization I have expressed the differences in these means.

y1 <- mean(data$Doll, na.rm = TRUE)
## Warning in mean.default(data$Doll, na.rm = TRUE): argument is not numeric or
## logical: returning NA
ggplot(test , aes(x = Doll, y = Others)) +
  geom_point() + 
  stat_summary(fun.data = 'mean_se',color = "magenta") +
  geom_hline(yintercept = y1, color ="blue", linetype = "dashed") 
## Warning: Removed 1 rows containing missing values (geom_hline).

Two Way Anova

summary(data1)
##    date_GMT           referee          total_goal_count
##  Length:380         Length:380         Min.   :0.000   
##  Class :character   Class :character   1st Qu.:2.000   
##  Mode  :character   Mode  :character   Median :3.000   
##                                        Mean   :2.821   
##                                        3rd Qu.:4.000   
##                                        Max.   :8.000   
##  total_goals_at_half_time stadium_name      
##  Min.   :0.000            Length:380        
##  1st Qu.:0.000            Class :character  
##  Median :1.000            Mode  :character  
##  Mean   :1.253                              
##  3rd Qu.:2.000                              
##  Max.   :6.000

Display the data within the table (Stadium Name and Total goal count).

table(data1$stadium_name,data1$total_goal_count)
##                                                               
##                                                                 0  1  2  3  4
##   Anfield (Liverpool)                                           1  2  4  3  3
##   Cardiff City Stadium (Cardiff (Caerdydd))                     2  2  3  6  0
##   Craven Cottage (London)                                       0  2  7  5  1
##   Emirates Stadium (London)                                     0  1 10  1  3
##   Etihad Stadium (Manchester)                                   0  3  2  4  5
##   Goodison Park (Liverpool)                                     1  3  7  2  5
##   John Smith's Stadium (Huddersfield- West Yorkshire)           1  6  4  6  1
##   King Power Stadium (Leicester- Leicestershire)                2  3  5  6  2
##   London Stadium (London)                                       1  3  4  2  6
##   Molineux Stadium (Wolverhampton- West Midlands)               1  2  9  3  2
##   Old Trafford (Manchester)                                     2  0  4  6  3
##   Selhurst Park (London)                                        3  4  6  2  3
##   St. James' Park (Newcastle upon Tyne)                         1  3  4  8  1
##   St. Mary's Stadium (Southampton- Hampshire)                   2  0  4  6  5
##   Stamford Bridge (London)                                      2  1  7  3  3
##   The American Express Community Stadium (Falmer- East Sussex)  1  6  4  2  3
##   Tottenham Hotspur Stadium (London)                            0  2  1  0  2
##   Turf Moor (Burnley)                                           0  2  6  4  6
##   Vicarage Road (Watford)                                       1  2  3  9  1
##   Vitality Stadium (Bournemouth- Dorset)                        1  3  4  4  5
##   Wembley Stadium (London)                                      0  5  1  2  5
##                                                               
##                                                                 5  6  7  8
##   Anfield (Liverpool)                                           3  2  1  0
##   Cardiff City Stadium (Cardiff (Caerdydd))                     3  3  0  0
##   Craven Cottage (London)                                       1  3  0  0
##   Emirates Stadium (London)                                     2  2  0  0
##   Etihad Stadium (Manchester)                                   2  1  2  0
##   Goodison Park (Liverpool)                                     0  0  0  1
##   John Smith's Stadium (Huddersfield- West Yorkshire)           1  0  0  0
##   King Power Stadium (Leicester- Leicestershire)                1  0  0  0
##   London Stadium (London)                                       1  1  1  0
##   Molineux Stadium (Wolverhampton- West Midlands)               1  0  1  0
##   Old Trafford (Manchester)                                     4  0  0  0
##   Selhurst Park (London)                                        0  0  0  1
##   St. James' Park (Newcastle upon Tyne)                         2  0  0  0
##   St. Mary's Stadium (Southampton- Hampshire)                   1  1  0  0
##   Stamford Bridge (London)                                      3  0  0  0
##   The American Express Community Stadium (Falmer- East Sussex)  3  0  0  0
##   Tottenham Hotspur Stadium (London)                            0  0  0  0
##   Turf Moor (Burnley)                                           0  1  0  0
##   Vicarage Road (Watford)                                       3  0  0  0
##   Vitality Stadium (Bournemouth- Dorset)                        0  2  0  0
##   Wembley Stadium (London)                                      1  0  0  0

Visualize the table data into ggplot.

ggplot(data1, aes(x = stadium_name, y = total_goal_count, color = total_goals_at_half_time))+
  geom_boxplot()

Time to run the ANOVA

twoWayAnova <- aov(total_goal_count ~ total_goals_at_half_time*date_GMT, data = data1)
summary(twoWayAnova)
##                                    Df Sum Sq Mean Sq F value Pr(>F)    
## total_goals_at_half_time            1  429.0   429.0 255.154 <2e-16 ***
## date_GMT                          211  261.0     1.2   0.736  0.970    
## total_goals_at_half_time:date_GMT  57   96.8     1.7   1.010  0.473    
## Residuals                         110  185.0     1.7                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

So here we examined if cut, color and the interaction between the two will have an effect on the tournament.

plot(twoWayAnova, 1:5)
## Warning: not plotting observations with leverage one:
##   2, 5, 7, 8, 9, 10, 11, 16, 17, 18, 19, 20, 21, 23, 26, 27, 28, 29, 30, 31, 37, 38, 39, 40, 41, 47, 48, 49, 50, 51, 58, 59, 60, 69, 70, 71, 77, 78, 79, 80, 81, 88, 89, 90, 96, 99, 100, 101, 106, 107, 108, 109, 110, 111, 116, 117, 118, 127, 128, 129, 130, 131, 137, 138, 139, 140, 143, 144, 149, 150, 151, 157, 158, 159, 160, 161, 167, 168, 169, 170, 171, 172, 179, 180, 181, 188, 189, 190, 196, 197, 198, 199, 200, 201, 202, 203, 205, 208, 209, 210, 211, 217, 218, 219, 220, 221, 228, 229, 230, 235, 236, 237, 238, 239, 240, 241, 247, 248, 249, 250, 251, 252, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 271, 273, 274, 275, 280, 286, 287, 288, 289, 290, 294, 296, 297, 298, 299, 303, 304, 305, 311, 312, 313, 314, 320, 324, 325, 326, 327, 332, 333, 334, 335, 336, 337, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370

## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced
## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced