Chapter 6 - Inference for Categorical Data
a.F: This sample has 46% of approval rate. Thus with 95% CI applies to entire population, the US population approval rate is between 43% and 49%.
b.T: The sample is less than 10% of the population. This sample is independent and allows us to make an inference about the population.
c.F: A CI is about the population proportion, not about a sample statistic.
d.F: The margin of error at a 90% confidence level, since we are lowering our confidence.
n <- 1259
p <- 0.48
z <- 1.96 # 95% CI, alpha of 0.05, on z table, z = 1.96
me <- z * sqrt(p*(1-p)/n)
ci.lower <- (p - me) * 100
ci.upper <- (p + me) * 100
The 95% confidence interval for the proportion of US residents who think marijuana should be made legal is from 45.240277 and 50.759723.
# margin of error = 2%
# me = z * se
# 95% CI , z = 1.96
p <- 0.48
me <- 0.02
z <- 1.96
se <- me / z
# standard of error = sqrt(p * (1-p) / n)
n <- (p * (1 - p) / (se^2))
n
## [1] 2397.158
n.ca <- 11545
p.ca <- 0.08
n.or <- 4691
p.or <- 0.088
z <- 1.96 # 95% CI
se.ca <- sqrt((p.ca)*(1-p.ca)/n.ca)
me.ca <- z * se.ca # margin of error CA at 95% CI
se.or <- sqrt((p.or)*(1-p.or)/n.or)
me.or <- z * se.or # margin of error OR at 95% CI
ca.lower <- p.ca - me.ca
ca.upper <- p.ca + me.ca
or.lower <- p.or - me.or
or.upper <- p.or + me.or
Ha: prop of CA residents with insufficient sleep != prop of OR residents with insufficient sleep
The np and n(1-p) are greater than 10 – success, failure.
CI 0.0750512 and 0.0849488
CI 0.079893 and 0.096107
Since the CA and OR CI overlap, we can’t reject the H0.
se <- sqrt((p.ca)*(1-p.ca)/n.ca + (p.or)*(1-p.or)/n.or) # Calculating a new SE for the differences
me <- z * se
# on 95%
diff <- p.or - p.ca
diff.lower <- diff - me
diff.upper <- diff + me
chisq.test(c(4,16,67,345), p = c(.048,.147,.396,.409))
##
## Chi-squared test for given probabilities
##
## data: c(4, 16, 67, 345)
## X-squared = 272.69, df = 3, p-value < 2.2e-16
dep <- 2607
not.dep <- 48132
total <- dep + not.dep
to.six.cups <- 6617
Ek <- (dep * to.six.cups) / total
chipart <- ((373-Ek) ^2) / Ek
found <- data.frame(Yes = c(670,373,905,564,95),
No =c(11545,6244,16329,11726,2288)
)
chisq.test(found)
##
## Pearson's Chi-squared test
##
## data: found
## X-squared = 20.932, df = 4, p-value = 0.0003267
We reject the H0, that the coffee consumption is associated with depression.
Agree. Despite the significance shown in the data, it is not an experiment. Correlation does not necessarily mean there’s causation. It is too early to make this recommendation that coffee consumption leads to reducing depression.