head(data)
## date_GMT referee total_goal_count
## 1 Aug 10 2018 - 7:00pm Andre Marriner 3
## 2 Aug 11 2018 - 11:30am Martin Atkinson 3
## 3 Aug 11 2018 - 2:00pm Kevin Friend 2
## 4 Aug 11 2018 - 2:00pm Mike Dean 2
## 5 Aug 11 2018 - 2:00pm Chris Kavanagh 3
## 6 Aug 11 2018 - 2:00pm Jonathan Moss 2
## total_goals_at_half_time stadium_name
## 1 1 Old Trafford (Manchester)
## 2 3 St. James' Park (Newcastle upon Tyne)
## 3 1 Vitality Stadium (Bournemouth- Dorset)
## 4 1 Craven Cottage (London)
## 5 2 John Smith's Stadium (Huddersfield- West Yorkshire)
## 6 1 Vicarage Road (Watford)
Using the Footbal data I will apply the methods of the χ2 distribution. Let’s find a nice contingency table to look at.
table(data$stadium_name)
##
## Anfield (Liverpool)
## 19
## Cardiff City Stadium (Cardiff (Caerdydd))
## 19
## Craven Cottage (London)
## 19
## Emirates Stadium (London)
## 19
## Etihad Stadium (Manchester)
## 19
## Goodison Park (Liverpool)
## 19
## John Smith's Stadium (Huddersfield- West Yorkshire)
## 19
## King Power Stadium (Leicester- Leicestershire)
## 19
## London Stadium (London)
## 19
## Molineux Stadium (Wolverhampton- West Midlands)
## 19
## Old Trafford (Manchester)
## 19
## Selhurst Park (London)
## 19
## St. James' Park (Newcastle upon Tyne)
## 19
## St. Mary's Stadium (Southampton- Hampshire)
## 19
## Stamford Bridge (London)
## 19
## The American Express Community Stadium (Falmer- East Sussex)
## 19
## Tottenham Hotspur Stadium (London)
## 5
## Turf Moor (Burnley)
## 19
## Vicarage Road (Watford)
## 19
## Vitality Stadium (Bournemouth- Dorset)
## 19
## Wembley Stadium (London)
## 14
Difference number of Football match is being played within the stadium. The number of match should be equal. Here is my null and alternative hypothesis:
$$
H0: The null hypothesis: It is a statement no difference between sample means or proportions or no difference between a sample mean or proportion and total goal count mean or proportion. In other words, the difference equals 0.
HA: The alternate hypothesis: It is claiming about the total goal count and that is contradictory to H0. p > 0.5 or p-value = 0.8978
$$
I could have also expressed this in terms of observed and expected. Something along the lines that the number of expected is equal to the number of observed.
chisq.test(data$total_goal_count)
## Warning in chisq.test(data$total_goal_count): Chi-squared approximation may be
## incorrect
##
## Chi-squared test for given probabilities
##
## data: data$total_goal_count
## X-squared = 344.49, df = 379, p-value = 0.8978
Here alpha is greater than 0.5, hence it reject my null hypothesis here.
The expected values for the number of goals as follows
chisq.test(data$total_goal_count)$expected
## Warning in chisq.test(data$total_goal_count): Chi-squared approximation may be
## incorrect
## [1] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [9] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [17] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [25] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [33] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [41] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [49] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [57] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [65] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [73] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [81] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [89] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [97] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [105] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [113] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [121] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [129] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [137] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [145] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [153] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [161] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [169] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [177] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [185] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [193] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [201] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [209] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [217] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [225] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [233] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [241] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [249] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [257] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [265] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [273] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [281] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [289] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [297] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [305] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [313] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [321] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [329] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [337] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [345] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [353] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [361] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [369] 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053 2.821053
## [377] 2.821053 2.821053 2.821053 2.821053
chisq.test(data$total_goals_at_half_time)
## Warning in chisq.test(data$total_goals_at_half_time): Chi-squared approximation
## may be incorrect
##
## Chi-squared test for given probabilities
##
## data: data$total_goals_at_half_time
## X-squared = 395.76, df = 379, p-value = 0.2662
Here alpha is less than 0.5, hence it cannot reject my null hypothesis here.
The expected values for the amount as follows:
chisq.test(data$total_goals_at_half_time)$expected
## Warning in chisq.test(data$total_goals_at_half_time): Chi-squared approximation
## may be incorrect
## [1] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [9] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [17] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [25] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [33] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [41] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [49] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [57] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [65] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [73] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [81] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [89] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [97] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [105] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [113] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [121] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [129] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [137] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [145] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [153] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [161] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [169] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [177] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [185] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [193] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [201] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [209] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [217] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [225] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [233] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [241] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [249] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [257] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [265] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [273] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [281] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [289] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [297] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [305] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [313] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [321] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [329] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [337] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [345] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [353] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [361] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [369] 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632 1.252632
## [377] 1.252632 1.252632 1.252632 1.252632
barplot(table(data$stadium_name))
Using two categorical variables preform a test for independence.
table(data$referee, data$total_goal_count)
##
## 0 1 2 3 4 5 6 7 8
## Andre Marriner 1 3 10 6 3 2 1 1 0
## Andy Madley 0 1 0 0 0 1 0 0 0
## Anthony Taylor 1 4 12 4 7 4 0 0 0
## Chris Kavanagh 1 6 7 3 2 3 1 1 0
## Craig Pawson 2 4 7 9 3 0 1 0 0
## David Coote 0 2 2 1 3 1 2 0 0
## Graham Scott 1 1 4 5 3 2 1 0 0
## Jonathan Moss 1 4 7 6 4 3 0 2 0
## Kevin Friend 1 6 6 4 6 3 1 0 0
## Lee Mason 3 1 7 1 4 2 0 1 0
## Lee Probert 2 3 4 3 4 1 1 0 0
## Martin Atkinson 3 3 7 7 8 1 0 0 0
## Michael Oliver 3 5 7 5 3 4 3 0 0
## Mike Dean 2 5 7 9 3 1 2 0 0
## Paul Tierney 0 3 2 11 5 1 1 0 1
## Roger East 0 0 3 3 1 1 1 0 1
## Simon Hooper 1 1 4 1 0 0 1 0 0
## Stuart Attwell 0 3 3 6 6 2 0 0 0
Let’s set up the hypothesis test.
HO:The referee played in stadium.
HA:The referee played in stadium is dependent on the match.
chisq.test(table(data$referee, data$total_goal_count))
## Warning in chisq.test(table(data$referee, data$total_goal_count)): Chi-squared
## approximation may be incorrect
##
## Pearson's Chi-squared test
##
## data: table(data$referee, data$total_goal_count)
## X-squared = 130.92, df = 136, p-value = 0.6069
Here I am also able to reject my null hypothesis because alpha value is greater than 0.5
chisq.test(table(data$referee, data$total_goal_count))$expected
## Warning in chisq.test(table(data$referee, data$total_goal_count)): Chi-squared
## approximation may be incorrect
##
## 0 1 2 3 4 5
## Andre Marriner 1.5631579 3.9078947 7.0342105 5.9684211 4.6184211 2.2736842
## Andy Madley 0.1157895 0.2894737 0.5210526 0.4421053 0.3421053 0.1684211
## Anthony Taylor 1.8526316 4.6315789 8.3368421 7.0736842 5.4736842 2.6947368
## Chris Kavanagh 1.3894737 3.4736842 6.2526316 5.3052632 4.1052632 2.0210526
## Craig Pawson 1.5052632 3.7631579 6.7736842 5.7473684 4.4473684 2.1894737
## David Coote 0.6368421 1.5921053 2.8657895 2.4315789 1.8815789 0.9263158
## Graham Scott 0.9842105 2.4605263 4.4289474 3.7578947 2.9078947 1.4315789
## Jonathan Moss 1.5631579 3.9078947 7.0342105 5.9684211 4.6184211 2.2736842
## Kevin Friend 1.5631579 3.9078947 7.0342105 5.9684211 4.6184211 2.2736842
## Lee Mason 1.1000000 2.7500000 4.9500000 4.2000000 3.2500000 1.6000000
## Lee Probert 1.0421053 2.6052632 4.6894737 3.9789474 3.0789474 1.5157895
## Martin Atkinson 1.6789474 4.1973684 7.5552632 6.4105263 4.9605263 2.4421053
## Michael Oliver 1.7368421 4.3421053 7.8157895 6.6315789 5.1315789 2.5263158
## Mike Dean 1.6789474 4.1973684 7.5552632 6.4105263 4.9605263 2.4421053
## Paul Tierney 1.3894737 3.4736842 6.2526316 5.3052632 4.1052632 2.0210526
## Roger East 0.5789474 1.4473684 2.6052632 2.2105263 1.7105263 0.8421053
## Simon Hooper 0.4631579 1.1578947 2.0842105 1.7684211 1.3684211 0.6736842
## Stuart Attwell 1.1578947 2.8947368 5.2105263 4.4210526 3.4210526 1.6842105
##
## 6 7 8
## Andre Marriner 1.13684211 0.35526316 0.14210526
## Andy Madley 0.08421053 0.02631579 0.01052632
## Anthony Taylor 1.34736842 0.42105263 0.16842105
## Chris Kavanagh 1.01052632 0.31578947 0.12631579
## Craig Pawson 1.09473684 0.34210526 0.13684211
## David Coote 0.46315789 0.14473684 0.05789474
## Graham Scott 0.71578947 0.22368421 0.08947368
## Jonathan Moss 1.13684211 0.35526316 0.14210526
## Kevin Friend 1.13684211 0.35526316 0.14210526
## Lee Mason 0.80000000 0.25000000 0.10000000
## Lee Probert 0.75789474 0.23684211 0.09473684
## Martin Atkinson 1.22105263 0.38157895 0.15263158
## Michael Oliver 1.26315789 0.39473684 0.15789474
## Mike Dean 1.22105263 0.38157895 0.15263158
## Paul Tierney 1.01052632 0.31578947 0.12631579
## Roger East 0.42105263 0.13157895 0.05263158
## Simon Hooper 0.33684211 0.10526316 0.04210526
## Stuart Attwell 0.84210526 0.26315789 0.10526316
mosaicplot(table(data$stadium_name, data$date_GMT))
library(ggmosaic)
## Loading required package: ggplot2
ggplot(data)+
geom_mosaic( aes(x = table(data$stadium_name, data$date_GMT),fill = stadium_name),na.rm = TRUE) +
labs(x = "Stadium Name",y = "Date", title = " Matches happend in Stadium")