Problem 6.6

  1. FALSE. We are 100% confident since we know that the proportion is 46%, which lies between 43 and 49%.

  2. TRUE. The boundaries of the confidence interval are the proprtion plus or minus the error.

  3. FALSE. The confidence interval is for the population proportion and not for the sample.

  4. FALSE. As we get less confident, the margin of error will increase.

Problem 6.12

  1. This is a measure for the sample. Therefore, it is a sample statistic.

  2. The margin of error can be shown by the following equation:

\(ME = z^{*}SE_{\hat{p}}\)

where \(SE_{\hat{p}}\) is:

\(SE_{\hat{p}} = \sqrt{\frac{p(1-p)}{n}}\)

In order to get the confidence interval, we will need to subtract and add the margin of error to the proportion. Using the above equations and the following R-code, we can get our answer:

p <- 0.48
c <- 0.95
alpha <- 1-c
n <- 1259
SE <- sqrt((p*(1-p))/n)
crit_z <- qnorm(alpha/2,lower.tail=FALSE)
ME <- crit_z*SE
upper <- p + ME
lower <- p - ME
cat("The 95% confidence interval is: (",lower,",",upper,")")
The 95% confidence interval is: ( 0.4524033 , 0.5075967 )

Therefore, we are 95% confident the the population proportion of US residents who think marijuana should be legalized is between 45.24% abd 50.76%.

  1. This is a normal distribution if both the number of successes and failures is above 10:

\(\text{Success}: n\hat{p} = 1259\times0.48 = 604.32\)

\(\text{Failure}: n(1-\hat{p}) = 1259\times(1-0.48) = 654.68\)

They are both over 10, so we can consider it normal.

  1. The confidence interval contains just over 50%. However, it also contains more percent under 50%. Therefore, this is not justified.

Problem 6.20

We need to rearrange the margin of error equation and solve for \(n\):

\(ME = z^{*}\sqrt{\frac{p(1-p)}{n}} \rightarrow \frac{ME}{z^{*}} = \sqrt{\frac{p(1-p)}{n}} \rightarrow \left(\frac{ME}{z^{*}}\right)^{2} = \frac{p(1-p)}{n} \rightarrow n = \frac{p(1-p)}{\left(\frac{ME}{z^{*}}\right)^{2}}\)

We can plug in the values from the previous problem and obtain our result:

p <- 0.48
c <- 0.95
alpha <- 1-c
SE <- sqrt((p*(1-p))/n)
crit_z <- qnorm(alpha/2,lower.tail=FALSE)
ME <- 0.02
n <- (p*(1-p))/((ME/crit_z)^2)
n
[1] 2397.07

We would then need to round up to the nearest whole number, making our final answer \(\boxed{2398}\).

Problem 6.28

The givens for this problem are:

\(\text{California (Subscript as "1")}: \hat{p}_{1} = 0.080, n_{1} = 11545\)

\(\text{Oregon (Subscript as "2")}: \hat{p}_{2} = 0.088, n_{2} = 4691\)

Same as the previous problems, the margin of error needs to be added and subtracted from the difference of the sample proportions. The margin of error is calculated like this:

\(ME = z^{*}\sqrt{\frac{\hat{p}_{1}(1-\hat{p}_{1})}{n_{1}}+\frac{\hat{p}_{2}(1-\hat{p}_{2})}{n_{2}}}\)

The confidence interval will be constructed by:

\(\hat{p}_{1}-\hat{p}_{2} \pm ME\)

The results can be found using the R-code:

p1 <- 0.08
n1 <- 11545
p2 <- 0.088
n2 <- 4691
c <- 0.95
alpha <- 1-c
crit_z <- qnorm(alpha/2,lower.tail=FALSE)
SE1 <- (p1*(1-p1))/n1
SE2 <- (p2*(1-p2))/n2
SET <- sqrt(SE1 + SE2)
ME <- crit_z*SET
diff_p <- p1 - p2
upper <- diff_p + ME
lower <- diff_p - ME
cat("The 95% confidence interval is: (",lower,",",upper,")")
The 95% confidence interval is: ( -0.01749795 , 0.001497954 )

These results show that we are 95% confident that the proportion of people from California who are sleep deprived is between 1.75% less and 0.15% more than people from Oregon.

Problem 6.44

  1. The null and alternative hypothesis is shown below:

\(H_{0}: \text{Barking deer do not prefer one habitat over the other for foraging (all $p_{i}$'s are equal)}\)

\(H_{a}: \text{Barking deer do prefer one habitat over the other for foraging (at least one of the $p_{i}$'s are different)}\)

  1. Chi-Square Goodness of Fit

  2. We can assume that the areas are independent of one another, and we can say that the sample size and distribution is large enough since all the expected values are greater than 5.

n <- c(4,16,67,345,426)
p <- c(0.048,0.147,0.396,1-0.048-0.147-0.396,1)
EV <- p*426
  1. The equation to calculate Chi-Squared, \(\chi^{2}\), is shown below:

\(\chi^{2} = \sum\frac{\left(O-E\right)^{2}}{E}\)

We can calculate the chi-squared statistic and resulting \(p\)-value with R:

chi2 <- 0
for (i in 1:4) {
  chi2 <- chi2 + ((n[i]-EV[i])^2)/EV[i]
}
num_categories <- 4
df <- num_categories - 1
pchisq(chi2,df,lower.tail=FALSE)
[1] 1.144396e-59

The \(p\)-value is approximately 0, so we can reject the null in favor of the alternative. Therefore, barking deer prefer to forage in certain habitats over others.

Problem 6.48

  1. Chi-Square test for independence.

  2. The null and alternative hypothesis are:

\(H_{0}: \text{The risk of depression does not depend on the amount of coffee consumed}\)

\(H_{a}: \text{The risk of depression varies on the amount of coffee consumed}\)

  1. This can be calculated with R:
yes <- 2607
no <- 48132
total <- 50739
yes_prop <- yes/total
no_prop <- no/total
yes_prop
[1] 0.05138059
no_prop
[1] 0.9486194
  1. The equation for Expected Count is:

\(E_{\text{row }i\text{, col }j} = \frac{(\text{row }i\text{ total})\times(\text{column }j\text{ total})}{\text{table total}}\)

This can be calculated in R:

row_tot <- 2607
col_tot <- 6617
total <- 50739
EV <- (row_tot*col_tot)/total
EV
[1] 339.9854

The Expected Count for the highlighted cell is 339.9854

  1. The degrees of freedom for a two-way table is:

\(df = (\text{number of rows})\times(\text{number of columns})\)

The \(p\)-value can then be calculated using R:

nrow <- 2
ncol <- 5
df <- (nrow-1)*(ncol-1)
chi2 <- 20.93
pchisq(chi2,df,lower.tail=FALSE)
[1] 0.0003269507

The \(p\)-value is approximately 0.0003.

  1. If we assume a 95% confidence interval (or even 99%), we can see that the \(p\)-value is less than the significance level. Therefore, we can reject the null hypothesis in favor of the alternative, and state that the risk of depression for women can vary based on amount of coffee drank.

  2. Yes, because the effects of this study could be due to a different variable and not only caffeine.