1. [All Students] Suppose that π‘Œ has a binomial distribution with parameters 𝑛 = 15 and πœ‹ (unknown) = the probability of a success, so π‘Œ ~ 𝐡𝑖𝑛(15, πœ‹). If we observe 𝑦 = 1 success in these 15 independent trials, give four 95% confidence intervals for the true πœ‹ - namely, those based on:

  1. the usual Wald interval (in class and Shafer Lecture 2, p.12). For binomial, \(\hat{\pi} = y/n\), so \(\hat{\pi} = 1/15\)
pi=1/15
u=pi+1.96*sqrt((pi*(1-pi))/15)
l=pi-1.96*sqrt((pi*(1-pi))/15)
c(l,u)
## [1] -0.05956933  0.19290266
  1. the modified Wald interval (in class and Shafer Lecture 2, p.14)
pi_b=(1+.5)/(15+1)
u=pi_b+1.96*sqrt((pi_b*(1-pi_b))/16) #divide by 16 because adding a half a success and failure
l=pi_b-1.96*sqrt((pi_b*(1-pi_b))/16)
c(l,u)
## [1] -0.04907549  0.23657549
  1. the logit transformation (in class and Shafer Lecture 2, p.17 ff.)
phi=log(pi/(1-pi))
iphi=15*pi*(1-pi)
l=phi-1.96*1/sqrt(iphi)
u=phi+1.96*1/sqrt(iphi)
plow=exp(l)/(1+exp(l))
pup=exp(u)/(1+exp(u))
c(plow,pup)
## [1] 0.009305044 0.351998845
  1. the log-likelihood (in class and Shafer Lecture 2, p.25 ff.)
library(binom)
y <- 1
n <- 15
binom.confint(y, n, conf.level = 0.95, methods = "lrt")
##   method x  n       mean       lower     upper
## 1    lrt 1 15 0.06666667 0.003926124 0.2621293

Also, of these, which one of these intervals is preferred and why? We will choose the LR method (part d) because: 1. it produces the smallest interval within the correct range of 0 to 1 2. Schafer notes says that statisticians tend to prefer the LR method.

2. [All Students] Suppose that if you stand at the main entrance of Loyola from 9am to 10am on any Monday morning during the school year, and that

β€’ the number of Loyola students you see with orange hair has a Poisson distribution with mean πŸ–, and that independently, β€’ the number of Loyola students you see with green hair has a Poisson distribution with mean πŸ•, and that independently, β€’ the number of Loyola students you see with purple hair has a Poisson distribution with mean πŸ“.

  1. Remembering that the sum of independent Poisson RVs has a Poisson distribution, describe the distribution of Loyola students you see at the main entrance from 9am till 10am with either orange or green or purple hair, and remember to give the mean. Justify your answer.

For Poisson mean = variance = 8+7+5=20.

The number of Loyola students you see at the main entrance of Loyola from 9am to 10am has a Poisson distribution with \(\lambda=20\).

  1. Find the exact probability that the number of such students in part (a) is exactly πŸπŸ“. Justify your answer.
dpois(15, 20)
## [1] 0.05164885
  1. Find the probability that the number of such students in part (a) is at most πŸπŸ“. Find both the exact answer and the normal approximation (remembering to use the continuity correction). Justify your answer.
ppois(15, 20)
## [1] 0.1565131
#normal approximation
zstar <- (15.5 - 20) / sqrt(20)
pnorm(zstar)
## [1] 0.1571523
  1. Suppose that of all the orange or green or purple-haired students which passed through the main entrance between 9am and 10am on Monday morning, you randomly choose 𝟏𝟎 of these. What is the probability that πŸ“ have orange hair, 𝟐 have green hair and πŸ‘ have purple hair? Justify your answer.
dmultinom(c(5,2,3), size = 10, prob = c(8/20,7/20,5/20), log = FALSE)
## [1] 0.049392
  1. [STAT-410 only] You wish to test the Hardy-Weinberg theory (Shafer Lecture 4, p.21 ff.) on your favorite trait, and you collect data on πŸ—πŸ” offspring, and observe πŸπŸ“ AA, πŸ‘πŸ‘ Aa and πŸ’πŸ– aa.
  1. Estimate πœ‹ (the proportion of dominant genes in the population). Show all work. Since genes are written in pairs, there are 192 total genes. So the proportion of dominant genes is
p=(2*15+33)/192;p
## [1] 0.328125
  1. Perform both of the goodness of fit tests to determine whether you believe the Hardy-Weinberg theory applies here; β€œboth” here refers to both the Pearson and the likelihood- ratio tests. Be clear with regard to the null and alternative hypotheses and your findings for both tests. Use 𝛼 = 5% for these tests.
pi1=p^2;pi1
## [1] 0.107666
pi2=2*p*(1-p);pi2
## [1] 0.440918
pi3=(1-p)^2;pi3
## [1] 0.451416
obs.data <- c(15, 33, 48)
gof.test <- chisq.test(obs.data, p = c(pi1,pi2,pi3));gof.test
## 
##  Chi-squared test for given probabilities
## 
## data:  obs.data
## X-squared = 4.6623, df = 2, p-value = 0.09718

Since our p-value 0.0971831 > .05, we fail-to-reject the null hypothesis and connot conclude that the data does not follow the Hardy-Weinberg Theory.

#Likelihood-ratio Test
gsq <- 2*(15*log(15/(pi1*96)) + 33*log(33/(pi2*96)) + 48*log(48/(pi3*96)));gsq
## [1] 4.555382
pv=1-pchisq(gsq,1);pv
## [1] 0.03281543

Since our p-value 0.0328154 < .05, we reject the null hypothesis and conclude that the data does not follow the Hardy-Weinberg Theory.

  1. Do the residual analysis and comment on your findings – be specific. To find the residuals here, use the Pearson approach (this is denoted 𝑋2 in Shafer’s notes).
gof.test$residuals
## [1]  1.4507395 -1.4337712  0.7085007
sqrt(2/3)
## [1] 0.8164966

AA was the fartest from what we expected and contributed the most the the \(\chi^2\) statistic, while aa was the closest to what we expected and contributed the least the the \(\chi^2\) statistic.

None of these appear to be much larger than the k-1/k threshold. Therefore, all three appear to fit the model.