The Wall Street Journal ran a story headlined: “Stephen Curry Shoots Better When Chewing His Mouthguard” (see http://www.wsj.com/articles/stephen-curry-shoots-better-when-chewing-his-mouthguard-14311106520.

The relevant data was included in the article, showing free throws made (FTM), attempted (FTA), and percentage made (%):

Mouthguard Out: FTM=198, FTA=214, 92.5% VERSUS Mouthguard In: FTM=110, FTA=123, 89.4%

Does the data support the claim? Does it merit an article?

rm(list=ls())
n_mgOut <- 214
makes_mgOut <- 198
pct_mgOut <- makes_mgOut / n_mgOut
pct_mgOut
## [1] 0.9252336
n_mgIn <- 123
makes_mgIn <- 110
pct_mgIn <- makes_mgIn / n_mgIn
pct_mgIn
## [1] 0.8943089
ratio_mgOut2In <- pct_mgOut / pct_mgIn
ratio_mgOut2In 
## [1] 1.034579

HYPOTHESIS OF INTEREST One-Tail Test? Null Hypothesis H0: ratio_mgOut2In <= 1.00
Alternative Hypothesis H1: ratio_mgOut2In > 1.00 Perhaps, BUT story would have run if results reversed SO, Two-Tail Test. Null Hypothesis H0: ratio_mgOut2In = 1.00
Alternative Hypothesis H1: ratio_mgOut2In != 1.00 Question: is the deviation from 1.00 “Significant”?

“PRACTICAL SIGNIFICANCE”: What if makes_mgIn were a bit higher? Keep incrementing makes_mgIn (110, 111, 112, …) until ratio_mgOut2In < 1.0

makes_mgIn <- makes_mgIn + 1
pct_mgIn <- makes_mgIn / n_mgIn
ratio_mgOut2In <- pct_mgOut / pct_mgIn
makes_mgIn
## [1] 111
ratio_mgOut2In 
## [1] 1.025259
makes_mgIn <- makes_mgIn + 1
pct_mgIn <- makes_mgIn / n_mgIn
ratio_mgOut2In <- pct_mgOut / pct_mgIn
makes_mgIn
## [1] 112
ratio_mgOut2In # re-run until < 1.0
## [1] 1.016105
makes_mgIn <- makes_mgIn + 1
pct_mgIn <- makes_mgIn / n_mgIn
ratio_mgOut2In <- pct_mgOut / pct_mgIn
makes_mgIn
## [1] 113
ratio_mgOut2In # re-run until < 1.0
## [1] 1.007113
makes_mgIn <- makes_mgIn + 1
pct_mgIn <- makes_mgIn / n_mgIn
ratio_mgOut2In <- pct_mgOut / pct_mgIn
makes_mgIn
## [1] 114
ratio_mgOut2In # re-run until < 1.0
## [1] 0.9982784

It is tedious to re-run; let’s automate:

n_mgIn <- 123
makes_mgIn <- 110
pct_mgIn <- makes_mgIn / n_mgIn
ratio_mgOut2In <- pct_mgOut / pct_mgIn
while (ratio_mgOut2In > 1.0) {
  makes_mgIn <- makes_mgIn + 1
  pct_mgIn <- (makes_mgIn) / n_mgIn
  ratio_mgOut2In <- pct_mgOut / pct_mgIn
}
makes_mgIn 
## [1] 114
ratio_mgOut2In 
## [1] 0.9982784
n_mgOut + n_mgIn 
## [1] 337

So, if made freethrows with mouthguard is increased from 110 to 114, the apparent differential in percentage made with mouthguard in versus out disappears. Is this a small differential, since Curry took 337 total freethrows during the season? Or is it a large differential, because missing 13 of 123 freethrows with mouthguard in is significantly more than missing 9 of 123?

“STATISTICAL SIGNIFICANCE”: Attempt 1 of 3 Google Search: “Test difference between two proportions”, yields http://stattrek.com/hypothesis-test/difference-in-proportions.aspx?Tutorial=AP

Hypothesis “Set 2”: p1 - p2 > 0 ==> p1 = %Made|mgOUT and p2 = %Made|mgIN.

Null H0: p1 = p2 Alt H1: p1 != p2 alpha = 0.005, 0.025, 0.050, etc test = Two-tailed, 2-proportion z-test p = (p1 * n1 + p2 * n2) / (n1 + n2) from website SE = sqrt{ p * ( 1 - p ) * [ (1/n1) + (1/n2) ] } from website z = (p1 - p2) / SE # Here we rely on a Normal distribution assumption

n1 <- 214
n2 <- 123
p1 <- 198 / 214
p1
## [1] 0.9252336
p2 <- 110 / 123
p2
## [1] 0.8943089
p <- ((p1 * n1) + (p2 * n2)) / (n1 + n2)
p
## [1] 0.9139466
se = sqrt (p * ( 1 - p ) * ( (1/n1) + (1/n2) ) ) 
se
## [1] 0.03173218
z = (p1 - p2) / se 
z 
## [1] 0.9745532
p_value <- pnorm(-z) + (1 - pnorm(z))
p_value
## [1] 0.3297819

So, z = 0.97 ~ 1.0; and 2/3rds of area under ~N within 1 s.e. of mean = 0. The p-value = 0.33, or 1/3rd of area under ~N > 1 s.e. from mean. Since 0.33 >> 0.05 or any other reasonable choice of alpha; DO NOT REJECT H0.

“STATISTICAL SIGNIFICANCE”: Attempt 2 of 3 Google Search: “Test difference between two proportions IN R”, yields http://www.r-tutor.com/elementary-statistics/inference-about-two-populations/comparison-two-population-proportions

makes <- c(198, 110)
shots <- c(214, 123)
prop.test(makes, shots, correct = FALSE)
## 
##  2-sample test for equality of proportions without continuity
##  correction
## 
## data:  makes out of shots
## X-squared = 0.94975, df = 1, p-value = 0.3298
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.03383462  0.09568403
## sample estimates:
##    prop 1    prop 2 
## 0.9252336 0.8943089

Note the same p-value = 0.3298 as before, with the same reliance on ~N. Also, note that the 95% Confidence Interval on p1 - p2 INCLUDES H0: (p1 - p2) = 0.0, in which case the observed difference in proportions is NOT significant.

“STATISTICAL SIGNIFICANCE”: Attempt 3 of 3 A Permutation Test: envision a rectangular block of data in which each row represents a freethrow by Steph Curry. Now let: Column 1 = Free throw number, in chronological order Column 2 = Mouthguard in (mgIn) or out (mgOut) Column 3 = Free throw made or missed.

To test whether the observed relationship between freethrow percentage made with mouthguard IN versus OUT, we first break any but a random relationship and then observe the rarity of out observed test score.

First, our setup:

n_mgOut <- 214
makes_mgOut <- 198
n_mgIn <- 123
makes_mgIn <- 110
n <- n_mgOut + n_mgIn
shot <- 1:n
mgOut <- c(rep(1, n_mgOut), rep(0, n_mgIn))
made <- c(rep(1, makes_mgOut), rep(0, n_mgOut - makes_mgOut), 
          rep(1, makes_mgIn), rep(0, n_mgIn - makes_mgIn) )
d <- data.frame(shot, mgOut, made)
head(d)
##   shot mgOut made
## 1    1     1    1
## 2    2     1    1
## 3    3     1    1
## 4    4     1    1
## 5    5     1    1
## 6    6     1    1
tail(d)
##     shot mgOut made
## 332  332     0    0
## 333  333     0    0
## 334  334     0    0
## 335  335     0    0
## 336  336     0    0
## 337  337     0    0
#
dOut <- subset(d, mgOut==1)
mgOutPct <- sum(dOut$made)/length(dOut$made)
mgOutPct
## [1] 0.9252336
dIn <- subset(d, mgOut==0)
mgInPct <- sum(dIn$made)/length(dIn$made)
mgInPct
## [1] 0.8943089
testStat <- mgOutPct / mgInPct
testStat 
## [1] 1.034579

Now, a few permutation runs:

random.seed <- runif(1, 0, 10^10)
dPerm <- data.frame(shot, mgOut, sample(made)) 
sum(dPerm[1:214, 3])
## [1] 197
dPerm <- data.frame(shot, mgOut, sample(made)) 
sum(dPerm[1:214, 3])
## [1] 193
dPerm <- data.frame(shot, mgOut, sample(made)) 
sum(dPerm[1:214, 3])
## [1] 192
dPerm <- data.frame(shot, mgOut, sample(made)) 
sum(dPerm[1:214, 3])
## [1] 198

Alright, now let’s run it 10,000 times:

random.seed <- runif(1, 0, 10^10)
iterations <- 10000 
permStats <- c() 
system.time( {                   
  for (i in 1:iterations) {
  dPerm <- data.frame(shot, mgOut, sample(made)) 
  dOutPerm <- subset(dPerm, mgOut==1)
  mgOutPctPerm <- sum(dOutPerm$sample.made.)/length(dOutPerm$sample.made.)
  dInPerm <- subset(dPerm, mgOut==0)
  mgInPctPerm <- sum(dInPerm$sample.made.)/length(dInPerm$sample.made.)
  permStats[i] <- mgOutPctPerm / mgInPctPerm
  }
} ) 
##    user  system elapsed 
##   5.222   0.014   5.240

Note that 10,000 runs took mere seconds.

Let’s estimate the p-value for our test of the hypothesis:

rareGT <- length(subset(permStats, permStats > testStat)) / length(permStats)
pseudo_p_value <- 2 * (length(subset(permStats, permStats > testStat)) / length(permStats))
pseudo_p_value 
## [1] 0.244

Finally, perhaps re-run after increasing iterations from 10,000 to 100,000. It makes little difference. The lower p-values (~0.24 for permutation tests versus ~0.33 for classical proportions test) may be attributable to the assumption of a Normal distribution for the classical tests. Permutation tests make no distributional assumptions. Either approach yields a p-value >> 0.05 or any other reasonable alpha, so all roads lead to REJECTING the null hypothesis, and therefore the validity of running such a story.

Finally, let’s visualize the results of our 10,000 runs of the permutation test:

plot(density(permStats), type ="h")

plot.ecdf(permStats)