I want to finish My Football statistics project with some analysis using the non-parametric methods. I am going to try to repeat many of the analysis I have already made (except for Matched Pairs).
Hypothesis are
$$
H0 : mM = mW
Ha : mM ≠ mW
$$
Visualize the data
ggplot(data = data,aes(x = data$total_goal_count, y = data$total_minute))+
geom_boxplot()
## Warning: Use of `data$total_goal_count` is discouraged. Use `total_goal_count`
## instead.
## Warning: Use of `data$total_minute` is discouraged. Use `total_minute` instead.
wilcox.test(data$total_goal_count, data$total_minute, data = data, paired=TRUE)
##
## Wilcoxon signed rank test with continuity correction
##
## data: data$total_goal_count and data$total_minute
## V = 0, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0
Here I am able to reject the null hypothesis.
data$total_goal_count[is.na(data$total_goal_count)] <- 0
data$total_goals_at_half_time[is.na(data$total_goals_at_half_time)] <- 0
data$total_minute[is.na(data$total_minute)] <- 0
data[which(data$total_goals_at_half_time <= data$total_minute),"Games"] = "Less"
data[which(data$total_goals_at_half_time > data$total_minute),"Games"] = "More"
Let’s summarize the data.
Median Value
by(data$total_goal_count,data$Games, median)
## data$Games: Less
## [1] 3
table(data$Games)
##
## Less
## 380
Let’s run the wilcox test
wilcox.test(data$total_goal_count, data$total_goals_at_half_time, data = data, paired=TRUE)
##
## Wilcoxon signed rank test with continuity correction
##
## data: data$total_goal_count and data$total_goals_at_half_time
## V = 46665, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0
Here I am able to reject the null hypothesis.
Visualize the data by boxplot
boxplot(data$total_goal_count~data$total_minute)
Visualize the data by ggplot
ggplot(data = data,aes(x = data$total_goal_count, y = data$stadium_name))+
geom_boxplot()
## Warning: Use of `data$total_goal_count` is discouraged. Use `total_goal_count`
## instead.
## Warning: Use of `data$stadium_name` is discouraged. Use `stadium_name` instead.
Two quantitative variables, preform a Spearman rank correlation test.
cor.test(data$total_goal_count,data$total_goals_at_half_time, method = "spearman")
## Warning in cor.test.default(data$total_goal_count,
## data$total_goals_at_half_time, : Cannot compute exact p-value with ties
##
## Spearman's rank correlation rho
##
## data: data$total_goal_count and data$total_goals_at_half_time
## S = 3410149, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.6271134
Visualize the data
ggplot(data = data, aes(total_goal_count, data$total_goals_at_half_time, color = total_minute))+
geom_point()
## Warning: Use of `data$total_goals_at_half_time` is discouraged. Use
## `total_goals_at_half_time` instead.
The hypothesis will be about ρS the Spearman Rank correlation.
$$
H0:ρS=0
Ha:ρS≠0
$$ There is a strong correlation! We’ll reject the null hypothesis that the spearman correlation is zero but we do not know if we have a linear relationship.