This is a continuation of this project <https://rpubs.com/nurfnick/720194

Ranks

Most of these methods use ranks! Let’s go ahead and show the ranks!

rank(data$PTS, ties.method = "average")
##  [1] 28.5 28.5 28.5 28.5 25.0 25.0 25.0 18.0 18.0 18.0 18.0 18.0 18.0 18.0 18.0
## [16] 18.0 18.0 18.0 10.0 10.0 10.0 10.0 10.0  4.5  4.5  4.5  4.5  4.5  4.5  1.0
## [31] 31.0

Some things to notice:

  1. R automatically did the average, the option ties.method is not required but can be changed.
  2. The last score here is an NA BUT it returned as the highest ranked! We’ll need to fix that!
rank(data$PTS, na.last = F)
##  [1] 29.5 29.5 29.5 29.5 26.0 26.0 26.0 19.0 19.0 19.0 19.0 19.0 19.0 19.0 19.0
## [16] 19.0 19.0 19.0 11.0 11.0 11.0 11.0 11.0  5.5  5.5  5.5  5.5  5.5  5.5  2.0
## [31]  1.0

I put the NA as the first (or lowest rank). Notice though it does change all the ranks!

Wilcox Rank Sum

Fixed Value

Let me do a test against a value first. I am going to ask is the median number of games played 4.

What was the median?

median(data$GP, na.rm = TRUE)
## [1] 3

Okay well this will at least be an interesting test then!

wilcox.test(data$GP, mu = 4, na.rm = TRUE)
## Warning in wilcox.test.default(data$GP, mu = 4, na.rm = TRUE): cannot compute
## exact p-value with ties
## Warning in wilcox.test.default(data$GP, mu = 4, na.rm = TRUE): cannot compute
## exact p-value with zeroes
## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  data$GP
## V = 0, p-value = 1.872e-05
## alternative hypothesis: true location is not equal to 4

I do reject the null hypothesis!

Compare Two Medians

I’ll examine the variable describing if you are staying out of the penalty box or not

This code has been used before.

data$PTS[is.na(data$PTS)] <- 0
data$PPO[is.na(data$PPO)] <- 0
data$PPOA[is.na(data$PPOA)] <- 0
data[which(data$PPO <= data$PPOA),"PowerPlays"] = "Less"
data[which(data$PPO > data$PPOA),"PowerPlays"] = "More"

I went ahead and assigned the team with no games zero points too.

Let’s summarize the data we’ll be comparing before we run the test

by(data$PTS,data$PowerPlays, median)
## data$PowerPlays: Less
## [1] 4
## ------------------------------------------------------------ 
## data$PowerPlays: More
## [1] 4
table(data$PowerPlays)
## 
## Less More 
##   17   14

Let’s run the test!

wilcox.test(PTS ~ PowerPlays, data = data)
## Warning in wilcox.test.default(x = c(6, 6, 6, 6, 5, 5, 4, 4, 4, 3, 3, 3, :
## cannot compute exact p-value with ties
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  PTS by PowerPlays
## W = 132, p-value = 0.6091
## alternative hypothesis: true location shift is not equal to 0

We will fail to reject the null hypothesis. Take as many penalties as you want! This was the previous result too!

Wilcox Sign Test

I need matched data! I am actually going to look at the same idea PowerPlay Attempts vs Your Opponents but rather than set up a categorical variable, I’ll compare those two values. My assumption is that if you are a mean team, taking lots of penalties, your opponent will too.

This is different from above! I could subtract the two columns or just add that they are paired. I’ll show that both are the same

wilcox.test(data$PPO, data$PPOA, paired = TRUE)
## Warning in wilcox.test.default(data$PPO, data$PPOA, paired = TRUE): cannot
## compute exact p-value with ties
## Warning in wilcox.test.default(data$PPO, data$PPOA, paired = TRUE): cannot
## compute exact p-value with zeroes
## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  data$PPO and data$PPOA
## V = 212, p-value = 0.9134
## alternative hypothesis: true location shift is not equal to 0
wilcox.test(data$PPO - data$PPOA)
## Warning in wilcox.test.default(data$PPO - data$PPOA): cannot compute exact p-
## value with ties
## Warning in wilcox.test.default(data$PPO - data$PPOA): cannot compute exact p-
## value with zeroes
## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  data$PPO - data$PPOA
## V = 212, p-value = 0.9134
## alternative hypothesis: true location is not equal to 0

Easy enough! We fail to reject the null hypothesis, there is not evidence to suggest that the number of powerplay opportunities and opponent opportunities are the same.

Kruskal-Wallis

Just like the ANOVA, we need more than two categories to compare

data$PPO[is.na(data$PPO)] <- 0
data$PPOA[is.na(data$PPOA)] <- 0
data[which(data$PPO < data$PPOA),"PowerPlays"] = "Less"
data[which(data$PPO > data$PPOA),"PowerPlays"] = "More"
data[which(data$PPO == data$PPOA),"PowerPlays"] = "Equal"

Looking at the median and seeing if the PTS stay the same

model = kruskal.test(PTS ~ PowerPlays, data = data)
model
## 
##  Kruskal-Wallis rank sum test
## 
## data:  PTS by PowerPlays
## Kruskal-Wallis chi-squared = 1.6151, df = 2, p-value = 0.4459

We do not have evidence to suggest that the number of points depends on Having more or less powerplays. We fail to reject the null hypothesis.

Spearman Rank Correlation

I’ll again look at Wins and Points.

cor.test(data$W,data$PTS, method = "spearman")
## Warning in cor.test.default(data$W, data$PTS, method = "spearman"): Cannot
## compute exact p-value with ties
## 
##  Spearman's rank correlation rho
## 
## data:  data$W and data$PTS
## S = 306.56, p-value = 7.576e-14
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.9317998

There is a strong correlation! We’ll reject the null hypothesis that the spearman correlation is zero but we do not know if we have a linear relationship.