Questions

6.6

  • a.) False. This is not the definition of a confidence interval. We are also given explicit metrics on the sample.
  • b.) True. This is the definition of a confidence interval. Assuming all the foundations of inference are met, this is the conclusion we would draw based on the information given.
  • c.) False. The confidence interval gives us conclusions about the entire population.
  • d.) False. Decreasing the confidence level would make the confidence interval narrower, making the margin of error decrease as well.

6.12

  • a.) 48% is a sample statistic. It was derived from the 1,259 sample of US residents.
  • b.)
n <- 1259
p <- .48
ci <- .95

se <- ((p * (1 - p)) / n) %>%
  sqrt
t <- qt(ci + (1 - ci)/2, n - 1)
me <- t * se

ci_int <- data_frame(lower = c(p - me), upper = c(p + me))
kable(ci_int, "html") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
lower upper
0.4523767 0.5076233
  • c.) True. The sample observations are independent, the sample size is large enough, the the distribution is normal.
  • d.) Based on the confidence interval, the upper limit is barely above 50%. We cannot say that a “majority” of Americans feels marijuana should be legal.

6.20

me <- .02
n <- ((p * (1 - p) * t^2) / (me ^ 2)) %>%
  print
## [1] 2401.689

6.28

pCA <- .08
nCA <- 11545

pOR <- .088
nOR <- 4691

ci <- .95
pDelta <- pOR - pCA

# standard error
se <- (((pCA * (1 - pCA)) / nCA) + ((pOR * (1 - pOR)) / nOR)) %>%
  sqrt
z <- qnorm(ci + (1 - ci) / 2)

me <- z * se

# confidence interval
conf_inv <- data_frame(lower = c(pDelta - me), upper = c(pDelta + me))
kable(conf_inv, "html") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
lower upper
-0.001498 0.017498

6.44

  • a.) H0: The sites where the barking deer forage are distributed according to the below proportions. HA: The sites where the barking deer forage are different from the propotions below.
props <- data_frame(woods = c(.048), grass = c(.147), forest = c(.396), other = c(1 - .048 - .147 - .396))
  • b.) We can use the chi-square test since we have several group cases.
  • c.) We have to assume independence based on the lack of information in the description. Our samples all contain at least 5 expected cases satisfying the sample size condition.
  • d.)
n <- 426
habs <- c(4, 16, 67, 345)
props <- c(.048, .147, .396, 1 - .048 - .147 - .396)

expected <- n * props

# chi square
chi <- ((habs - expected) ^ 2 / expected) %>%
  sum %>%
  print
## [1] 276.6135
# p value
p_chi <- (1 - pchisq(chi, df = length(habs) - 1)) %>%
  print
## [1] 0

The p_value = 0. Since the p value is < .001 and < .05, we reject the H0. There is enough evidence to support the claim that barking deer forage in certain habitates over others.

6.48

  • a.) The chi-squared test for two-way tables is appropriate for evaluation if there is a relationship between coffee intake and depression.
  • b.) H0: There is no association between caffeinated coffee consumption and depression. HA: There is an association between caffeinated coffee consumption and depression.
  • c.)
dep_table <- data_frame(depression = c('yes', 'no', 'total'), persons = c(2607, 48132, 2607 + 48132)) %>%
  mutate(proportion = persons / sum(c(2607, 48132)))

kable(dep_table, "html") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
depression persons proportion
yes 2607 0.0513806
no 48132 0.9486194
total 50739 1.0000000
  • d.)
two_cup_total <- 6617
ek <- (dep_table$persons[dep_table$depression == 'yes'] * two_cup_total / dep_table$persons[dep_table$depression == 'total'])

chi <- (((373 - ek) ^ 2) / ek) %>%
  print
## [1] 3.205914
  • e.)
n <- 5
k <- 2

df <- (n-1)*(k-1)
chi <- 20.93

p <- (1 - pchisq(chi, df)) %>%
  print
## [1] 0.0003269507
  • f.) Since the p-value is below .05, we cannot reject the NULL hypothesis.
  • g.) I agree with this statement based on the chi-square test. There was no evidence of a relationship between coffee consumption and depression.