The Wall Street Journal ran a story headlined: “Stephen Curry Shoots Better When Chewing His Mouthguard” (see http://www.wsj.com/articles/stephen-curry-shoots-better-when-chewing-his-mouthguard-14311106520.
The relevant data was included in the article, showing free throws made (FTM), attempted (FTA), and percentage made (%):
Mouthguard Out: FTM=198, FTA=214, 92.5% VERSUS Mouthguard In: FTM=110, FTA=123, 89.4%
Does the data support the claim? Does it merit an article?
rm(list=ls())
n_mgOut <- 214
makes_mgOut <- 198
pct_mgOut <- makes_mgOut / n_mgOut
pct_mgOut
## [1] 0.9252336
n_mgIn <- 123
makes_mgIn <- 110
pct_mgIn <- makes_mgIn / n_mgIn
pct_mgIn
## [1] 0.8943089
ratio_mgOut2In <- pct_mgOut / pct_mgIn
ratio_mgOut2In
## [1] 1.034579
HYPOTHESIS OF INTEREST One-Tail Test? Null Hypothesis H0: ratio_mgOut2In <= 1.00
Alternative Hypothesis H1: ratio_mgOut2In > 1.00 Perhaps, BUT story would have run if results reversed SO, Two-Tail Test. Null Hypothesis H0: ratio_mgOut2In = 1.00
Alternative Hypothesis H1: ratio_mgOut2In != 1.00 Question: is the deviation from 1.00 “Significant”?
“PRACTICAL SIGNIFICANCE”: What if makes_mgIn were a bit higher? Keep incrementing makes_mgIn (110, 111, 112, …) until ratio_mgOut2In < 1.0
makes_mgIn <- makes_mgIn + 1
pct_mgIn <- makes_mgIn / n_mgIn
ratio_mgOut2In <- pct_mgOut / pct_mgIn
makes_mgIn
## [1] 111
ratio_mgOut2In
## [1] 1.025259
makes_mgIn <- makes_mgIn + 1
pct_mgIn <- makes_mgIn / n_mgIn
ratio_mgOut2In <- pct_mgOut / pct_mgIn
makes_mgIn
## [1] 112
ratio_mgOut2In # re-run until < 1.0
## [1] 1.016105
makes_mgIn <- makes_mgIn + 1
pct_mgIn <- makes_mgIn / n_mgIn
ratio_mgOut2In <- pct_mgOut / pct_mgIn
makes_mgIn
## [1] 113
ratio_mgOut2In # re-run until < 1.0
## [1] 1.007113
makes_mgIn <- makes_mgIn + 1
pct_mgIn <- makes_mgIn / n_mgIn
ratio_mgOut2In <- pct_mgOut / pct_mgIn
makes_mgIn
## [1] 114
ratio_mgOut2In # re-run until < 1.0
## [1] 0.9982784
It is tedious to re-run; let’s automate:
n_mgIn <- 123
makes_mgIn <- 110
pct_mgIn <- makes_mgIn / n_mgIn
ratio_mgOut2In <- pct_mgOut / pct_mgIn
while (ratio_mgOut2In > 1.0) {
makes_mgIn <- makes_mgIn + 1
pct_mgIn <- (makes_mgIn) / n_mgIn
ratio_mgOut2In <- pct_mgOut / pct_mgIn
}
makes_mgIn
## [1] 114
ratio_mgOut2In
## [1] 0.9982784
n_mgOut + n_mgIn
## [1] 337
So, if made freethrows with mouthguard is increased from 110 to 114, the apparent differential in percentage made with mouthguard in versus out disappears. Is this a small differential, since Curry took 337 total freethrows during the season? Or is it a large differential, because missing 13 of 123 freethrows with mouthguard in is significantly more than missing 9 of 123?
“STATISTICAL SIGNIFICANCE”: Attempt 1 of 3 Google Search: “Test difference between two proportions”, yields http://stattrek.com/hypothesis-test/difference-in-proportions.aspx?Tutorial=AP
Hypothesis “Set 2”: p1 - p2 > 0 ==> p1 = %Made|mgOUT and p2 = %Made|mgIN.
Null H0: p1 = p2 Alt H1: p1 != p2 alpha = 0.005, 0.025, 0.050, etc test = Two-tailed, 2-proportion z-test p = (p1 * n1 + p2 * n2) / (n1 + n2) from website SE = sqrt{ p * ( 1 - p ) * [ (1/n1) + (1/n2) ] } from website z = (p1 - p2) / SE # Here we rely on a Normal distribution assumption
n1 <- 214
n2 <- 123
p1 <- 198 / 214
p1
## [1] 0.9252336
p2 <- 110 / 123
p2
## [1] 0.8943089
p <- ((p1 * n1) + (p2 * n2)) / (n1 + n2)
p
## [1] 0.9139466
se = sqrt (p * ( 1 - p ) * ( (1/n1) + (1/n2) ) )
se
## [1] 0.03173218
z = (p1 - p2) / se
z
## [1] 0.9745532
p_value <- pnorm(-z) + (1 - pnorm(z))
p_value
## [1] 0.3297819
So, z = 0.97 ~ 1.0; and 2/3rds of area under ~N within 1 s.e. of mean = 0. The p-value = 0.33, or 1/3rd of area under ~N > 1 s.e. from mean. Since 0.33 >> 0.05 or any other reasonable choice of alpha; DO NOT REJECT H0.
“STATISTICAL SIGNIFICANCE”: Attempt 2 of 3 Google Search: “Test difference between two proportions IN R”, yields http://www.r-tutor.com/elementary-statistics/inference-about-two-populations/comparison-two-population-proportions
makes <- c(198, 110)
shots <- c(214, 123)
prop.test(makes, shots, correct = FALSE)
##
## 2-sample test for equality of proportions without continuity
## correction
##
## data: makes out of shots
## X-squared = 0.94975, df = 1, p-value = 0.3298
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## -0.03383462 0.09568403
## sample estimates:
## prop 1 prop 2
## 0.9252336 0.8943089
Note the same p-value = 0.3298 as before, with the same reliance on ~N. Also, note that the 95% Confidence Interval on p1 - p2 INCLUDES H0: (p1 - p2) = 0.0, in which case the observed difference in proportions is NOT significant.
“STATISTICAL SIGNIFICANCE”: Attempt 3 of 3 A Permutation Test: envision a rectangular block of data in which each row represents a freethrow by Steph Curry. Now let: Column 1 = Free throw number, in chronological order Column 2 = Mouthguard in (mgIn) or out (mgOut) Column 3 = Free throw made or missed.
To test whether the observed relationship between freethrow percentage made with mouthguard IN versus OUT, we first break any but a random relationship and then observe the rarity of out observed test score.
First, our setup:
n_mgOut <- 214
makes_mgOut <- 198
n_mgIn <- 123
makes_mgIn <- 110
n <- n_mgOut + n_mgIn
shot <- 1:n
mgOut <- c(rep(1, n_mgOut), rep(0, n_mgIn))
made <- c(rep(1, makes_mgOut), rep(0, n_mgOut - makes_mgOut),
rep(1, makes_mgIn), rep(0, n_mgIn - makes_mgIn) )
d <- data.frame(shot, mgOut, made)
head(d)
## shot mgOut made
## 1 1 1 1
## 2 2 1 1
## 3 3 1 1
## 4 4 1 1
## 5 5 1 1
## 6 6 1 1
tail(d)
## shot mgOut made
## 332 332 0 0
## 333 333 0 0
## 334 334 0 0
## 335 335 0 0
## 336 336 0 0
## 337 337 0 0
#
dOut <- subset(d, mgOut==1)
mgOutPct <- sum(dOut$made)/length(dOut$made)
mgOutPct
## [1] 0.9252336
dIn <- subset(d, mgOut==0)
mgInPct <- sum(dIn$made)/length(dIn$made)
mgInPct
## [1] 0.8943089
testStat <- mgOutPct / mgInPct
testStat
## [1] 1.034579
Now, a few permutation runs:
random.seed <- runif(1, 0, 10^10)
dPerm <- data.frame(shot, mgOut, sample(made))
sum(dPerm[1:214, 3])
## [1] 197
dPerm <- data.frame(shot, mgOut, sample(made))
sum(dPerm[1:214, 3])
## [1] 193
dPerm <- data.frame(shot, mgOut, sample(made))
sum(dPerm[1:214, 3])
## [1] 192
dPerm <- data.frame(shot, mgOut, sample(made))
sum(dPerm[1:214, 3])
## [1] 198
Alright, now let’s run it 10,000 times:
random.seed <- runif(1, 0, 10^10)
iterations <- 10000
permStats <- c()
system.time( {
for (i in 1:iterations) {
dPerm <- data.frame(shot, mgOut, sample(made))
dOutPerm <- subset(dPerm, mgOut==1)
mgOutPctPerm <- sum(dOutPerm$sample.made.)/length(dOutPerm$sample.made.)
dInPerm <- subset(dPerm, mgOut==0)
mgInPctPerm <- sum(dInPerm$sample.made.)/length(dInPerm$sample.made.)
permStats[i] <- mgOutPctPerm / mgInPctPerm
}
} )
## user system elapsed
## 5.222 0.014 5.240
Note that 10,000 runs took mere seconds.
Let’s estimate the p-value for our test of the hypothesis:
rareGT <- length(subset(permStats, permStats > testStat)) / length(permStats)
pseudo_p_value <- 2 * (length(subset(permStats, permStats > testStat)) / length(permStats))
pseudo_p_value
## [1] 0.244
Finally, perhaps re-run after increasing iterations from 10,000 to 100,000. It makes little difference. The lower p-values (~0.24 for permutation tests versus ~0.33 for classical proportions test) may be attributable to the assumption of a Normal distribution for the classical tests. Permutation tests make no distributional assumptions. Either approach yields a p-value >> 0.05 or any other reasonable alpha, so all roads lead to REJECTING the null hypothesis, and therefore the validity of running such a story.
Finally, let’s visualize the results of our 10,000 runs of the permutation test:
plot(density(permStats), type ="h")
plot.ecdf(permStats)