6.6

a) False: we know the proportion in the sample. The confidence interval is for the population

b) True: The population proportion is what our sample is trying to estimate and we are 95% sure the overall mean falls between 43 and 49 percent.

c) False: We don’t know what the population proportion is, so we can’t say anything about a confidence interval around it.

d) False: The z statistic would be lower, and margin = SE X Z

6.12

a) 48% is a sample statistic. We have not polled the entire population, which is why we are concerned with inference around our sample

b) \[.48\quad \pm \quad \sqrt { \frac { (.48\ast (1-.48)) }{ 1259 } } *1.96\]$

margin <- (sqrt(.48*(1-.48)/1259))*1.96

lower <- .48 - margin
upper <- .48 + margin
sprintf("%s to %s", lower,upper)
## [1] "0.452402769762903 to 0.507597230237097"

c) Yes because of the central limit theroem, be can use the normal model for the bernouli distribution if there are enough samples. We will have over 600 samples for both positives and negatives, easily passing the 10 needed for the rule of thumb.

d) It wouldn’t be fair to say a majority support legalization unless the entire confidence interval were about 50%. We could say a majority might support legalizaiton.

6.20

\[(margin/1.96)^{ 2 }=\frac { (.48(1-.48)) }{ n }\]

\[(1.96/margin)^{ 2 }=\frac { n }{ (.48*(1-.48)) } \]

\[(1.96/margin)^{ 2 }\quad *\quad (1-.48)*.48\quad =\quad n\]

n = 2397.1584

6.28

pooled <- (.08*11545 + .088*4691)/(11545 + 4691)

se <- sqrt(pooled/11545 + pooled/4691)
margin <- 1.96 * se
lower <- .008 - margin
upper <- .008 + margin
sprintf("%s to %s", lower,upper)
## [1] "-0.0017363442206999 to 0.0177363442206999"

Because the interval goes below 0 we can’t with confidence there is a difference between the proportions

6.44

a) Ho: each foraging spot is equally likely
Ha: The deer have a preference in foraging

b) A chi squared test should be used

c) The independance requirement passes 20.448 is greater than the 5 necesary for the cell

d)

(4 - 426*.048)^2 + (16 - .147*426)^2 + (67 - .396*426)^2 - 87
## [1] 12699.22

Without even including the “other” bin, the test statistic is easily significant at 3 degrees of freedom. It appears as though the barking deer favors some sites over others.

6.48

a) a Chi squared test would be most appropriate

b) Ho: the indcedence is equally likely for each level of coffee consumption
Ha: There is a difference in indcidence of depression between the levels of coffee consumption

c)

do : 0.0513806
do not: 0.9486194

d)

library(tidyverse)
row1 <- c(670, 373, 905, 564, 95)
row2 <- c(11545, 6244, 16329, 11726, 2288)
df <- data.frame(row1, row2)
df %>%
  mutate(
    col_tot = row1 + row2, 
    expected1 = (col_tot*sum(row1))/sum(row1 + row2),
    expected2 = (col_tot*sum(row2))/sum(row1 + row2)
    )
##   row1  row2 col_tot expected1 expected2
## 1  670 11545   12215  627.6140 11587.386
## 2  373  6244    6617  339.9854  6277.015
## 3  905 16329   17234  885.4932 16348.507
## 4  564 11726   12290  631.4675 11658.532
## 5   95  2288    2383  122.4400  2260.560

e) At (2-1) x (5-1) = 4 df, p = 3.269507310^{-4}

f) The null hypothesis can be rejected easily

g) I would agree that this analysis is not sufficient to say coffee helps depression, as it was an observational study and we only know that the proportions are different amoung the groups. We don’t know if there is pairwise sigficance between the higher and lower levels of coffee consumption.