15.11 Exercises
1. A famous athlete has an impressive career, winning 70% of her 500 career matches. However, this athlete gets criticized because in important events, such as the Olympics, she has a losing record of 8 wins and 9 losses. Perform a Chi-square test to determine if this losing record can be simply due to chance as opposed to not performing well under pressure.
mat<-matrix(c(.3*500, .7*500, 9, 8), 2, 2, byrow=TRUE)
chisq.test(mat, correct=FALSE)
##
## Pearson's Chi-squared test
##
## data: mat
## X-squared = 4.0631, df = 1, p-value = 0.04383
2. Why did we use the Chi-square test instead of Fisher’s exact test in the previous exercise? B. Because the sum of the rows and columns of the two-by-two table are not fixed so the hypergeometric distribution is not an appropriate assumption for the null hypothesis. For this reason, Fisher’s exact test is rarely applicable with observational data.
3. Compute the odds ratio of “losing under pressure” along with a confidence interval.
odds_win<-mat[1, 1]/mat[2, 1]
odds_loss<-mat[1, 2]/mat[2, 2]
loseoddr<-log(odds_win/odds_loss)
se<-sqrt(sum(1/mat))
exp(loseoddr+c(-1, 1)*1.96*se)
## [1] 0.1442096 1.0063459
2*pnorm(-abs(loseoddr/se))
## [1] 0.05150641
4. Notice that the p-value is larger than 0.05 but the 95% confidence interval does not include 1. What explains this? Different approximations are used for the p-value and the confidence interval calculation. If we had a larger sample size the match would be better.
5. Multiply the two-by-two table by 2 and see if the p-value and confidence retrieval are a better match. Yes, the p-value and confidence intervals improved significantly when all cells were doubled. The p-value decreased, and the CI intervals didn’t cross 1.
mat1<-mat*2
odds_win1<-mat1[1, 1]/mat1[2, 1]
odds_loss1<-mat1[1, 2]/mat1[2, 2]
loseoddr1<-log(odds_win1/odds_loss1)
se1<-sqrt(sum(1/mat1))
exp(loseoddr1+c(-1, 1)*1.96*se1)
## [1] 0.1916721 0.7571510
2*pnorm(-abs(loseoddr1/se))
## [1] 0.05150641