Football

I want to finish My Football statistics project with some analysis using the non-parametric methods. I am going to try to repeat many of the analysis I have already made (except for Matched Pairs).

Hypothesis are

$$

H0 : mM = mW

Ha : mM ≠ mW

$$

Visualize the data

ggplot(data = data,aes(x = data$total_goal_count, y = data$total_minute))+
  geom_boxplot()
## Warning: Use of `data$total_goal_count` is discouraged. Use `total_goal_count`
## instead.
## Warning: Use of `data$total_minute` is discouraged. Use `total_minute` instead.

wilcox.test(data$total_goal_count, data$total_minute, data = data, paired=TRUE)
## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  data$total_goal_count and data$total_minute
## V = 0, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0

Here I am able to reject the null hypothesis.

Use a categorical variable and a quantitative variable to compare two medians using Wilcoxon ranked sum test.

data$total_goal_count[is.na(data$total_goal_count)] <- 0
data$total_goals_at_half_time[is.na(data$total_goals_at_half_time)] <- 0
data$total_minute[is.na(data$total_minute)] <- 0
data[which(data$total_goals_at_half_time <= data$total_minute),"Games"] = "Less"
data[which(data$total_goals_at_half_time > data$total_minute),"Games"] = "More"

Let’s summarize the data.

Median Value

by(data$total_goal_count,data$Games, median)
## data$Games: Less
## [1] 3
table(data$Games)
## 
## Less 
##  380

Let’s run the wilcox test

wilcox.test(data$total_goal_count, data$total_goals_at_half_time, data = data, paired=TRUE)
## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  data$total_goal_count and data$total_goals_at_half_time
## V = 46665, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0

Here I am able to reject the null hypothesis.

Visualize the data by boxplot

boxplot(data$total_goal_count~data$total_minute)

Visualize the data by ggplot

ggplot(data = data,aes(x = data$total_goal_count, y = data$stadium_name))+
  geom_boxplot()
## Warning: Use of `data$total_goal_count` is discouraged. Use `total_goal_count`
## instead.
## Warning: Use of `data$stadium_name` is discouraged. Use `stadium_name` instead.

Spearman Rank Correlation

Two quantitative variables, preform a Spearman rank correlation test.

cor.test(data$total_goal_count,data$total_goals_at_half_time, method = "spearman")
## Warning in cor.test.default(data$total_goal_count,
## data$total_goals_at_half_time, : Cannot compute exact p-value with ties
## 
##  Spearman's rank correlation rho
## 
## data:  data$total_goal_count and data$total_goals_at_half_time
## S = 3410149, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.6271134

Visualize the data

ggplot(data = data, aes(total_goal_count, data$total_goals_at_half_time, color = total_minute))+
  geom_point()
## Warning: Use of `data$total_goals_at_half_time` is discouraged. Use
## `total_goals_at_half_time` instead.

The hypothesis will be about ρS the Spearman Rank correlation.

$$

H0:ρS=0

Ha:ρS≠0

$$ There is a strong correlation! We’ll reject the null hypothesis that the spearman correlation is zero but we do not know if we have a linear relationship.