head(data)
##                date_GMT         referee total_goal_count
## 1  Aug 10 2018 - 7:00pm  Andre Marriner                3
## 2 Aug 11 2018 - 11:30am Martin Atkinson                3
## 3  Aug 11 2018 - 2:00pm    Kevin Friend                2
## 4  Aug 11 2018 - 2:00pm       Mike Dean                2
## 5  Aug 11 2018 - 2:00pm  Chris Kavanagh                3
## 6  Aug 11 2018 - 2:00pm   Jonathan Moss                2
##   total_goals_at_half_time                                        stadium_name
## 1                        1                           Old Trafford (Manchester)
## 2                        3               St. James' Park (Newcastle upon Tyne)
## 3                        1              Vitality Stadium (Bournemouth- Dorset)
## 4                        1                             Craven Cottage (London)
## 5                        2 John Smith's Stadium (Huddersfield- West Yorkshire)
## 6                        1                             Vicarage Road (Watford)

Goodness of Fit

Using the Footbal data I will apply the methods of the χ2 distribution. Let’s find a nice contingency table to look at.

table(data$stadium_name)
## 
##                                          Anfield (Liverpool) 
##                                                           19 
##                    Cardiff City Stadium (Cardiff (Caerdydd)) 
##                                                           19 
##                                      Craven Cottage (London) 
##                                                           19 
##                                    Emirates Stadium (London) 
##                                                           19 
##                                  Etihad Stadium (Manchester) 
##                                                           19 
##                                    Goodison Park (Liverpool) 
##                                                           19 
##          John Smith's Stadium (Huddersfield- West Yorkshire) 
##                                                           19 
##               King Power Stadium (Leicester- Leicestershire) 
##                                                           19 
##                                      London Stadium (London) 
##                                                           19 
##              Molineux Stadium (Wolverhampton- West Midlands) 
##                                                           19 
##                                    Old Trafford (Manchester) 
##                                                           19 
##                                       Selhurst Park (London) 
##                                                           19 
##                        St. James' Park (Newcastle upon Tyne) 
##                                                           19 
##                  St. Mary's Stadium (Southampton- Hampshire) 
##                                                           19 
##                                     Stamford Bridge (London) 
##                                                           19 
## The American Express Community Stadium (Falmer- East Sussex) 
##                                                           19 
##                           Tottenham Hotspur Stadium (London) 
##                                                            5 
##                                          Turf Moor (Burnley) 
##                                                           19 
##                                      Vicarage Road (Watford) 
##                                                           19 
##                       Vitality Stadium (Bournemouth- Dorset) 
##                                                           19 
##                                     Wembley Stadium (London) 
##                                                           14

Difference number of Football match is being played within the stadium. The number of match should be equal. Here is my null and alternative hypothesis:

$$

H0: The null hypothesis: It is a statement no difference between sample means or proportions or no difference between a sample mean or proportion and total goal count mean or proportion. In other words, the difference equals 0.

HA: The alternate hypothesis: It is claiming about the total goal count and that is contradictory to H0. p > 0.5 or p-value = 0.8978

$$

I could have also expressed this in terms of observed and expected. Something along the lines that the number of expected is equal to the number of observed.

Chi-squared test using one categorical variables

chisq.test(data$total_goal_count)
## Warning in chisq.test(data$total_goal_count): Chi-squared approximation may be
## incorrect
## 
##  Chi-squared test for given probabilities
## 
## data:  data$total_goal_count
## X-squared = 344.49, df = 379, p-value = 0.8978

Here alpha is greater than 0.5, hence it reject my null hypothesis here.

The expected values for the number of goals as follows

chisq.test(data$total_goal_count)$expected
## Warning in chisq.test(data$total_goal_count): Chi-squared approximation may be
## incorrect
##   [1] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
##   [9] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
##  [17] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
##  [25] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
##  [33] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
##  [41] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
##  [49] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
##  [57] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
##  [65] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
##  [73] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
##  [81] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
##  [89] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
##  [97] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [105] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [113] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [121] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [129] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [137] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [145] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [153] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [161] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [169] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [177] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [185] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [193] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [201] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [209] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [217] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [225] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [233] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [241] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [249] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [257] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [265] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [273] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [281] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [289] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [297] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [305] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [313] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [321] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [329] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [337] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [345] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [353] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [361] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [369] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [377] 2.821053 2.821053 2.821053 2.821053
chisq.test(data$total_goals_at_half_time)
## Warning in chisq.test(data$total_goals_at_half_time): Chi-squared approximation
## may be incorrect
## 
##  Chi-squared test for given probabilities
## 
## data:  data$total_goals_at_half_time
## X-squared = 395.76, df = 379, p-value = 0.2662

Here alpha is less than 0.5, hence it cannot reject my null hypothesis here.

The expected values for the amount as follows:

chisq.test(data$total_goals_at_half_time)$expected
## Warning in chisq.test(data$total_goals_at_half_time): Chi-squared approximation
## may be incorrect
##   [1] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
##   [9] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
##  [17] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
##  [25] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
##  [33] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
##  [41] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
##  [49] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
##  [57] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
##  [65] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
##  [73] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
##  [81] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
##  [89] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
##  [97] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [105] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [113] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [121] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [129] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [137] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [145] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [153] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [161] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [169] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [177] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [185] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [193] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [201] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [209] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [217] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [225] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [233] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [241] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [249] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [257] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [265] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [273] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [281] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [289] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [297] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [305] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [313] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [321] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [329] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [337] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [345] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [353] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [361] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [369] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [377] 1.252632 1.252632 1.252632 1.252632

Barplot of Stadium Name

barplot(table(data$stadium_name))

Test for Independence

Using two categorical variables preform a test for independence.

table(data$referee, data$total_goal_count)
##                  
##                    0  1  2  3  4  5  6  7  8
##   Andre Marriner   1  3 10  6  3  2  1  1  0
##   Andy Madley      0  1  0  0  0  1  0  0  0
##   Anthony Taylor   1  4 12  4  7  4  0  0  0
##   Chris Kavanagh   1  6  7  3  2  3  1  1  0
##   Craig Pawson     2  4  7  9  3  0  1  0  0
##   David Coote      0  2  2  1  3  1  2  0  0
##   Graham Scott     1  1  4  5  3  2  1  0  0
##   Jonathan Moss    1  4  7  6  4  3  0  2  0
##   Kevin Friend     1  6  6  4  6  3  1  0  0
##   Lee Mason        3  1  7  1  4  2  0  1  0
##   Lee Probert      2  3  4  3  4  1  1  0  0
##   Martin Atkinson  3  3  7  7  8  1  0  0  0
##   Michael Oliver   3  5  7  5  3  4  3  0  0
##   Mike Dean        2  5  7  9  3  1  2  0  0
##   Paul Tierney     0  3  2 11  5  1  1  0  1
##   Roger East       0  0  3  3  1  1  1  0  1
##   Simon Hooper     1  1  4  1  0  0  1  0  0
##   Stuart Attwell   0  3  3  6  6  2  0  0  0

Let’s set up the hypothesis test.

HO:The referee played in stadium.

HA:The referee played in stadium is dependent on the match.

Chi-squared test using two categorical variables

chisq.test(table(data$referee, data$total_goal_count))
## Warning in chisq.test(table(data$referee, data$total_goal_count)): Chi-squared
## approximation may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  table(data$referee, data$total_goal_count)
## X-squared = 130.92, df = 136, p-value = 0.6069

Here I am also able to reject my null hypothesis because alpha value is greater than 0.5

Expected Value

chisq.test(table(data$referee, data$total_goal_count))$expected
## Warning in chisq.test(table(data$referee, data$total_goal_count)): Chi-squared
## approximation may be incorrect
##                  
##                           0         1         2         3         4         5
##   Andre Marriner  1.5631579 3.9078947 7.0342105 5.9684211 4.6184211 2.2736842
##   Andy Madley     0.1157895 0.2894737 0.5210526 0.4421053 0.3421053 0.1684211
##   Anthony Taylor  1.8526316 4.6315789 8.3368421 7.0736842 5.4736842 2.6947368
##   Chris Kavanagh  1.3894737 3.4736842 6.2526316 5.3052632 4.1052632 2.0210526
##   Craig Pawson    1.5052632 3.7631579 6.7736842 5.7473684 4.4473684 2.1894737
##   David Coote     0.6368421 1.5921053 2.8657895 2.4315789 1.8815789 0.9263158
##   Graham Scott    0.9842105 2.4605263 4.4289474 3.7578947 2.9078947 1.4315789
##   Jonathan Moss   1.5631579 3.9078947 7.0342105 5.9684211 4.6184211 2.2736842
##   Kevin Friend    1.5631579 3.9078947 7.0342105 5.9684211 4.6184211 2.2736842
##   Lee Mason       1.1000000 2.7500000 4.9500000 4.2000000 3.2500000 1.6000000
##   Lee Probert     1.0421053 2.6052632 4.6894737 3.9789474 3.0789474 1.5157895
##   Martin Atkinson 1.6789474 4.1973684 7.5552632 6.4105263 4.9605263 2.4421053
##   Michael Oliver  1.7368421 4.3421053 7.8157895 6.6315789 5.1315789 2.5263158
##   Mike Dean       1.6789474 4.1973684 7.5552632 6.4105263 4.9605263 2.4421053
##   Paul Tierney    1.3894737 3.4736842 6.2526316 5.3052632 4.1052632 2.0210526
##   Roger East      0.5789474 1.4473684 2.6052632 2.2105263 1.7105263 0.8421053
##   Simon Hooper    0.4631579 1.1578947 2.0842105 1.7684211 1.3684211 0.6736842
##   Stuart Attwell  1.1578947 2.8947368 5.2105263 4.4210526 3.4210526 1.6842105
##                  
##                            6          7          8
##   Andre Marriner  1.13684211 0.35526316 0.14210526
##   Andy Madley     0.08421053 0.02631579 0.01052632
##   Anthony Taylor  1.34736842 0.42105263 0.16842105
##   Chris Kavanagh  1.01052632 0.31578947 0.12631579
##   Craig Pawson    1.09473684 0.34210526 0.13684211
##   David Coote     0.46315789 0.14473684 0.05789474
##   Graham Scott    0.71578947 0.22368421 0.08947368
##   Jonathan Moss   1.13684211 0.35526316 0.14210526
##   Kevin Friend    1.13684211 0.35526316 0.14210526
##   Lee Mason       0.80000000 0.25000000 0.10000000
##   Lee Probert     0.75789474 0.23684211 0.09473684
##   Martin Atkinson 1.22105263 0.38157895 0.15263158
##   Michael Oliver  1.26315789 0.39473684 0.15789474
##   Mike Dean       1.22105263 0.38157895 0.15263158
##   Paul Tierney    1.01052632 0.31578947 0.12631579
##   Roger East      0.42105263 0.13157895 0.05263158
##   Simon Hooper    0.33684211 0.10526316 0.04210526
##   Stuart Attwell  0.84210526 0.26315789 0.10526316

Contingency table of mosaic plot

mosaicplot(table(data$stadium_name, data$date_GMT))

ggmosaic was designed to create visualizations of categorical data

library(ggmosaic)
## Loading required package: ggplot2
ggplot(data)+
  geom_mosaic( aes(x = table(data$stadium_name, data$date_GMT),fill = stadium_name),na.rm = TRUE) +
  labs(x = "Stadium Name",y = "Date", title = "   Matches happend in Stadium")