410_hw1.R

##### QUESTION #1 
#### [All Students] Suppose that 𝑌 has a binomial distribution with parameters 𝑛 = 15 and 𝜋 (unknown) = the probability of a success, so 𝑌 ~ 𝐵𝑖𝑛(15, 𝜋). If we observe 𝑦 = 1 success in these 15 independent trials, give four 95% confidence intervals for the true 𝜋 - namely, those based on: 
####  (a) the usual Wald interval (in class and Shafer Lecture 2, p.12) 
####  (b) the modified Wald interval (in class and Shafer Lecture 2, p.14) 
####  (c) the logit transformation (in class and Shafer Lecture 2, p.17 ff.) 
####  (d) the log-likelihood (in class and Shafer Lecture 2, p.25 ff.)

#### Also, of these, which one of these intervals is preferred and why? 

x<-1
n<-15
pi<-x/n

## Part A - Usual Wald Interval ##
pi_a <- pi
lower <- pi_a - 1.96*(sqrt(pi_a*(1-pi_a))/15)
upper <- pi_a + 1.96*(sqrt(pi_a*(1-pi_a))/15)
c(lower,upper)

## [1] 0.03407267 0.09926066

## Standard 95% CI: 0.03407267 - 0.09926066

## Part B - Modified CI ##
pi_b <- (x+.5)/(n+1)
lower <- pi_b - 1.96*sqrt((pi_b*(1-pi_b))/(n+1))
upper <- pi_b + 1.96*sqrt((pi_b*(1-pi_b))/(n+1))
c(lower,upper)

## [1] -0.04907549  0.23657549

## Modified 95% CI: -0.04907549 - 0.23657549

## Part C - Logit Transformation CI ##
pi_c<-log((pi/(1-pi)))
i_phi<-n*pi*(1-pi)
lower<-pi_c-1.96*sqrt(1/i_phi)
e_low<-exp(lower)
upper<-pi_c+1.96*sqrt(1/i_phi)
e_up<-exp(upper)

p_low<-e_low/(1+e_low)
p_up<-e_up/(1+e_up)
c(p_low,p_up)

## [1] 0.009305044 0.351998845

## Logit 95% CI: 0.009305044 - 0.351998845

## Part D - Log-Likelihood CI ##
library(binom)
y<-1
binom.confint(y, n, conf.level = 0.95, method = "lrt")

##   method x  n       mean       lower     upper
## 1    lrt 1 15 0.06666667 0.003926124 0.2621293

## LR 95% CI: 0.003926124 - 0.2621293
# output below
#  method   x   n    mean       lower         upper
#     lrt   1   15  0.06666667  0.003926124   0.2621293

## Preferred Interval
# Q1_a<-0.09926066-0.03407267
  # width=0.06518799

# Q1_b<-0.23657549+0.04907549
  # width=0.285651

# Q1_c<-0.351998845-0.009305044
  # width=0.3426938

# Q1_d<-0.2621293-0.003926124
  # width=0.2582032

# Log-likelihood 95% CI is preferred. The answers from part a and part d have the narrowest CIs but the LR method from part d is scale-invariant which is why it is preferentially to the other three methods used in question 1.


##### QUESTION #2
#### [All Students] Suppose that if you stand at the main entrance of Loyola from 9am to 10am on any Monday morning during the school year, and that
#### • the number of Loyola students you see with orange hair has a Poisson distribution with mean 𝟖, and that independently,
#### • the number of Loyola students you see with green hair has a Poisson distribution with mean 𝟕,and that independently,
#### • the number of Loyola students you see with purple hair has a Poisson distribution with mean 𝟓.

#### (a) Remembering that the sum of independent Poisson RVs has a Poisson distribution, describe the distribution of Loyola students you see at the main entrance from 9am till 10am with either orange or green or purple hair, and remember to give the mean. Justify your answer.
#### (b) Find the exact probability that the number of such students in part (a) is exactly 𝟏𝟓. Justify your answer.
#### (c) Find the probability that the number of such students in part (a) is at most 𝟏𝟓. Find both the exact answer and the normal approximation (remembering to use the continuity correction).Justify your answer.
#### (d) Suppose that of all the orange or green or purple-haired students which passed through the main entrance between 9am and 10am on Monday morning, you randomly choose 𝟏𝟎 of these. What is the probability that 𝟓 have orange hair, 𝟐 have green hair and 𝟑 have purple hair? Justify your answer.

oh<-8
gh<-7
ph<-5

### Part A ###
# Lambda=X1 + X2 + ... + Xn
lambda<-sum(oh,gh,ph)
#lambda=20

### Part B ###
dpois(15,lambda) # 0.05164885

## [1] 0.05164885

### Part C ###
# exact
ppois(15,lambda)

## [1] 0.1565131

# normal approximation
zstar <- (15.5 - 20) / sqrt(20)
pnorm(zstar) # 0.1571523

## [1] 0.1571523

### Part D ###
dmultinom(c(5,2,3), size = 10, prob = c((oh/lambda),(gh/lambda),(ph/lambda)), log = FALSE)

## [1] 0.049392

# 0.049392


##### QUESTION #3
####  [STAT-410 only] You wish to test the Hardy-Weinberg theory (Shafer Lecture 4, p.21 ff.) on your favorite trait, and you collect data on 96 offspring, and observe 15 AA, 33 Aa and 48 aa. 

#### (a) Estimate 𝜋 (the proportion of dominant genes in the population). Show all work. 
#### (b) Perform both of the goodness of fit tests to determine whether you believe the HardyWeinberg theory applies here; “both” here refers to both the Pearson and the likelihood ratio tests. Be clear with regard to the null and alternative hypotheses and your findings for both tests. Use 𝛼 = 5% for these tests. 
#### (c) Do the residual analysis and comment on your findings – be specific. To find the residuals here, use the Pearson approach (this is denoted 𝑋2 in Shafer’s notes).

n<-96
## Part A ##
pi<-(15+15+33)/(n*2)
#pi=0.328125

## Part B ##
## Pearson Chi-Sq

pi_1<-pi*pi
#pi_1=0.107666
pi_2<-2*pi*(1-pi)
#pi_2=0.440918
pi_3<-(1-pi)^2
#pi_3=0.451416

obs.data <- c(15, 33, 48)
gof.test <- chisq.test(obs.data, p = c(pi_1, pi_2, pi_3))
gof.test

## 
##  Chi-squared test for given probabilities
## 
## data:  obs.data
## X-squared = 4.6623, df = 2, p-value = 0.09718

# X-squared = 4.6623, df = 2, p-value = 0.09718
# With a p-value=.097, we fail to reject the null hypothesis and do not have sufficient evidence to suggest that the population does not follow the Hardy-Weinberg theory. In other words, we have statistically significant evidence which supports the presence of Harvey-Weinberg equilibrium in our population.

## Likelihood Ratio
gsq<-2*(15*log(15/(pi_1*n)) + 33*log(33/(pi_2*n)) + 48*log(48/(pi_3*n)));
gsq

## [1] 4.555382

1-pchisq(gsq,1)

## [1] 0.03281543

#LR p-value=0.03281543
# With a p-value=.0328, we reject the null hypothesis and have sufficient evidence to suggest that the population does not follow the Hardy-Weinberg theory.

## Part C ##
gof.test$residuals

## [1]  1.4507395 -1.4337712  0.7085007

r<-sqrt(2/3)
r

## [1] 0.8164966

# 0.8164966

410_hw1.R

sofia

Sat Jan 26 22:32:32 2019