This is a continuation of this project <https://rpubs.com/nurfnick/720194
Most of these methods use ranks! Let’s go ahead and show the ranks!
rank(data$PTS, ties.method = "average")
## [1] 28.5 28.5 28.5 28.5 25.0 25.0 25.0 18.0 18.0 18.0 18.0 18.0 18.0 18.0 18.0
## [16] 18.0 18.0 18.0 10.0 10.0 10.0 10.0 10.0 4.5 4.5 4.5 4.5 4.5 4.5 1.0
## [31] 31.0
Some things to notice:
rank(data$PTS, na.last = F)
## [1] 29.5 29.5 29.5 29.5 26.0 26.0 26.0 19.0 19.0 19.0 19.0 19.0 19.0 19.0 19.0
## [16] 19.0 19.0 19.0 11.0 11.0 11.0 11.0 11.0 5.5 5.5 5.5 5.5 5.5 5.5 2.0
## [31] 1.0
I put the NA as the first (or lowest rank). Notice though it does change all the ranks!
Let me do a test against a value first. I am going to ask is the median number of games played 4.
What was the median?
median(data$GP, na.rm = TRUE)
## [1] 3
Okay well this will at least be an interesting test then!
wilcox.test(data$GP, mu = 4, na.rm = TRUE)
## Warning in wilcox.test.default(data$GP, mu = 4, na.rm = TRUE): cannot compute
## exact p-value with ties
## Warning in wilcox.test.default(data$GP, mu = 4, na.rm = TRUE): cannot compute
## exact p-value with zeroes
##
## Wilcoxon signed rank test with continuity correction
##
## data: data$GP
## V = 0, p-value = 1.872e-05
## alternative hypothesis: true location is not equal to 4
I do reject the null hypothesis!
I’ll examine the variable describing if you are staying out of the penalty box or not
This code has been used before.
data$PTS[is.na(data$PTS)] <- 0
data$PPO[is.na(data$PPO)] <- 0
data$PPOA[is.na(data$PPOA)] <- 0
data[which(data$PPO <= data$PPOA),"PowerPlays"] = "Less"
data[which(data$PPO > data$PPOA),"PowerPlays"] = "More"
I went ahead and assigned the team with no games zero points too.
Let’s summarize the data we’ll be comparing before we run the test
by(data$PTS,data$PowerPlays, median)
## data$PowerPlays: Less
## [1] 4
## ------------------------------------------------------------
## data$PowerPlays: More
## [1] 4
table(data$PowerPlays)
##
## Less More
## 17 14
Let’s run the test!
wilcox.test(PTS ~ PowerPlays, data = data)
## Warning in wilcox.test.default(x = c(6, 6, 6, 6, 5, 5, 4, 4, 4, 3, 3, 3, :
## cannot compute exact p-value with ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: PTS by PowerPlays
## W = 132, p-value = 0.6091
## alternative hypothesis: true location shift is not equal to 0
We will fail to reject the null hypothesis. Take as many penalties as you want! This was the previous result too!
I need matched data! I am actually going to look at the same idea PowerPlay Attempts vs Your Opponents but rather than set up a categorical variable, I’ll compare those two values. My assumption is that if you are a mean team, taking lots of penalties, your opponent will too.
This is different from above! I could subtract the two columns or just add that they are paired. I’ll show that both are the same
wilcox.test(data$PPO, data$PPOA, paired = TRUE)
## Warning in wilcox.test.default(data$PPO, data$PPOA, paired = TRUE): cannot
## compute exact p-value with ties
## Warning in wilcox.test.default(data$PPO, data$PPOA, paired = TRUE): cannot
## compute exact p-value with zeroes
##
## Wilcoxon signed rank test with continuity correction
##
## data: data$PPO and data$PPOA
## V = 212, p-value = 0.9134
## alternative hypothesis: true location shift is not equal to 0
wilcox.test(data$PPO - data$PPOA)
## Warning in wilcox.test.default(data$PPO - data$PPOA): cannot compute exact p-
## value with ties
## Warning in wilcox.test.default(data$PPO - data$PPOA): cannot compute exact p-
## value with zeroes
##
## Wilcoxon signed rank test with continuity correction
##
## data: data$PPO - data$PPOA
## V = 212, p-value = 0.9134
## alternative hypothesis: true location is not equal to 0
Easy enough! We fail to reject the null hypothesis, there is not evidence to suggest that the number of powerplay opportunities and opponent opportunities are the same.
Just like the ANOVA, we need more than two categories to compare
data$PPO[is.na(data$PPO)] <- 0
data$PPOA[is.na(data$PPOA)] <- 0
data[which(data$PPO < data$PPOA),"PowerPlays"] = "Less"
data[which(data$PPO > data$PPOA),"PowerPlays"] = "More"
data[which(data$PPO == data$PPOA),"PowerPlays"] = "Equal"
Looking at the median and seeing if the PTS stay the same
model = kruskal.test(PTS ~ PowerPlays, data = data)
model
##
## Kruskal-Wallis rank sum test
##
## data: PTS by PowerPlays
## Kruskal-Wallis chi-squared = 1.6151, df = 2, p-value = 0.4459
We do not have evidence to suggest that the number of points depends on Having more or less powerplays. We fail to reject the null hypothesis.
I’ll again look at Wins and Points.
cor.test(data$W,data$PTS, method = "spearman")
## Warning in cor.test.default(data$W, data$PTS, method = "spearman"): Cannot
## compute exact p-value with ties
##
## Spearman's rank correlation rho
##
## data: data$W and data$PTS
## S = 306.56, p-value = 7.576e-14
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.9317998
There is a strong correlation! We’ll reject the null hypothesis that the spearman correlation is zero but we do not know if we have a linear relationship.