False; we know from the point estimate that exactly 46% of Americans in the sample support the decision.
True; the 95% CI is \(0.46 \pm 0.03 =\) (43%, 49%). This assumes that the conditions for inference are met, including that this is a random sample.
False; 95% of the sample proportions will be within 2 standard deviations of the true population mean. Alternatively, 95% of the 95% confidence intervals constructed from the many random samples will include the true population mean.
False; decreasing the confidence level decreases the critical Z-value, which decreases the margin of error.
48% is a sample statistic; it represents the proportion of the 1,259 respondents in the survey who have that view.
\(n = 1259\)
\(p = 0.48\)
\(SE = \sqrt{p (1-p) / n} = 0.0141\)
\(ME = 1.96 SE = 0.0276\) at 95% confidence level
So the 95% CI is 48% \(\pm\) 1.41% = (45.2%, 50.8%). The interpretation is that we are 95% confident that between 45% and 51% of the population of US residents think that the use of marijuana should be made legal.
n <- 1259
p <- 0.48
(SE <- sqrt(p * (1-p) / n))
## [1] 0.01408022
(ME <- 1.96 * SE)
## [1] 0.02759723
(p + c(-ME, ME))
## [1] 0.4524028 0.5075972No it is not justified, since the 95% CI is (45.2%, 50.8%), which is mostly below 50%. The true population proportion may be >50%, but it may be anywhere in the interval at a 95% confidence level.
At a 95% confidence level, the margin of error is:
\[ ME_{\hat{p}} = 1.96 SE_{\hat{p}} = 1.96 \sqrt{\frac{p (1-p)}{n}} \lt 0.02\] Since we don’t know the true population proproption \(p\), let’s assume the worse case of \(p=0.5\). Then substituting and solving for \(n\):
\[ n \gt \left(\frac{1.96}{0.02}\right)^2 p (1-p) = \left(\frac{1.96}{0.02}\right)^2 0.5 (1-0.5) = 2401\]
(1.96 * 0.5 / 0.02)^2
## [1] 2401
So we would need to survey at least 2,401 Americans in order to limit the margin of error to 2%, at a 95% confidence level.
Interpretation: The interpretation is that, at a 95% confidence level, the proportion of CA residents who report insufficient sleep was between -1.8% lower and 0.2% higher than the corresponding proportion of OR residents (inference from sample to population statistics).
nc <- 11545
no <- 4691
pc <- 0.08
po <- 0.088
(pc - po)
## [1] -0.008
(se <- sqrt( pc * (1-pc) / nc + po * (1-po) / no ))
## [1] 0.004845984
(me <- 1.96 * se)
## [1] 0.009498128
(pc - po + c(-me, me))
## [1] -0.017498128 0.001498128\(H_0\): Barking deer have no preference for foraging in certain habitats over others.
\(H_A\): Barking deer have a preference for foraging in certain habitats over others.
We can use the chi-squared goodness of fit test. We will test whether the observed foraging sites imply whether the deer have any preference for foraging habitats.
\(k = 4\), with the categories Woods, Cultivated grassplot, Deciduous forests, and Other
\(n = 426\) observations
Expected values if \(H_0\) is true:
\(E_w = 0.048 * 426 = 20.4\)
\(E_c = 0.147 * 426 = 62.6\)
\(E_d = 0.396 * 426 = 168.7\)
\(E_o = 0.409 * 426 = 174.2\)
\(df = k-1 = 3\)
\(Z_w = \frac{4 - 20.4}{\sqrt{20.4}} = -3.6\)
\(Z_c = \frac{16 - 62.6}{\sqrt{62.6}} = -5.9\)
\(Z_d = \frac{61 - 168.7}{\sqrt{168.7}} = -7.8\)
\(Z_o = \frac{345 - 174.2}{\sqrt{174.2}} = 12.9\)
(Note there’s a typo in the book - in the table, Deciduous forests should be 61, not 67.)
Now the chi-squared statistic is:
\[\chi^2 = Z_w^2 + Z_c^2 + Z_d^2 + Z_o^2 = 276.6\]
This is an exceptionally large \(\chi^2\) value, which is far in the tail of the \(H_0\) probability distribution, which is strong evidence favoring the alternative hypothesis \(H_A\) (the actual p-value is \(\ll\) 0.1%). We conclude that the deer exhibit a preference for foraging in certain habitats over others.
k <- 4
n <- 426
p_w <- 0.048
p_c <- 0.147
p_d <- 0.396
(p_o <- 1 - p_w - p_c - p_d)
## [1] 0.409
(p_w + p_c + p_d + p_o)
## [1] 1
(E_w <- p_w * n)
## [1] 20.448
(E_c <- p_c * n)
## [1] 62.622
(E_d <- p_d * n)
## [1] 168.696
(E_o <- p_o * n)
## [1] 174.234
(E_w + E_c + E_d + E_o)
## [1] 426
O_w <- 4
O_c <- 16
O_d <- 61
O_o <- 345
(O_w + O_c + O_d + O_o)
## [1] 426
(Z_w <- (O_w - E_w) / sqrt(E_w))
## [1] -3.637372
(Z_c <- (O_c - E_c) / sqrt(E_c))
## [1] -5.891521
(Z_d <- (O_d - E_d) / sqrt(E_d))
## [1] -8.291769
(Z_o <- (O_o - E_o) / sqrt(E_o))
## [1] 12.93704
(chisq <- Z_w^2 + Z_c^2 + Z_d^2 + Z_o^2)
## [1] 284.0609Chi-squared test for independence in a two-way table.
\(H_0\): There is no association between coffee intake and depression.
\(H_A\): There is an association between coffee intake and depression.
Overall proportion of women who do or do not suffer from depression:
Yes: \(2607 / 50739 = 0.0514\)
No: \(48132 / 50739 = 0.9486\)
(pd <- 2607 / 50739)
## [1] 0.05138059
(1-pd)
## [1] 0.9486194Expected count for (Yes = clinical depression) and (2-6 cups/week = coffee consumption):
\(6617 \cdot 2607 / 50739 = 340\)
Contribution of this cell to the \(\chi^2\) test statistic:
\((Observed - Expected)^2 / Expected = (373 - 340)^2 / 340 = 3.21\)
(e <- pd * 6617)
## [1] 339.9854
o <- 373
(o - e)^2 / e
## [1] 3.205914The degrees of freedome is \(df = (5-1) \cdot (2-1) = 4\). From the \(\chi^2\) distribution table in the back of the book, the p-value corresponding to \(\chi^2 = 20.93\) and \(df = 4\) is <0.1%.
Since the p-value of \(\lt 0.001\) is less than our significance level of \(\alpha = 0.05\), we reject \(H_0\) in favor of \(H_A\) and conclude that there is an association between coffee intake and depression.
Yes, it would be premature to recommend higher coffee consumption on the basis of this study alone. There are other factors to consider, for instance, side effects of higher caffeine intake, and other dimensions of psychological health that are not considered in this study. In addition, the study should be independently verified by other researchers to validate the results of this study.