Graded: 6.6, 6.12, 6.20, 6.28, 6.44, 6.48

Exercise 6.6 2010 Healthcare Law

On June 28, 2012 the U.S. Supreme Court upheld the much debated 2010 healthcare law, declaring it constitutional. A Gallup poll released the day after this decision indicates that 46% of 1,012 Americans agree with this decision. At a 95% confidence level, this sample has a 3% margin of error. Based on this information, determine if the following statements are true or false, and explain your reasoning.

(a) We are 95% confident that between 43% and 49% of Americans in this sample support the decision of the U.S. Supreme Court on the 2010 healthcare law.

False. We are 100% sure that between 43% and 49% of Americans in this sample support the decision of the U.S. Supreme Court on the 2010 healthcare law because we know our sample proportion with certainty.

(b) We are 95% confident that between 43% and 49% of Americans support the decision of the U.S. Supreme Court on the 2010 healthcare law.

True. If we have a 3% margin of error at a 95% confidence level, then our confidence interval would be equal to the sample proportion (which is 46) \(\pm\) 3%. \(46 \pm 3 = 43 \text{ to } 49\). Assuming our sample was randomly selected, then we can infer the population statistic from this sample at a 95% confidence level is between 43% and 49%.

(c) If we considered many random samples of 1,012 Americans, and we calculated the sample proportions of those who support the decision of the U.S. Supreme Court, 95% of those sample proportions will be between 43% and 49%.

True. That is the definition of a confidence interval. If we calculated a 95% confidence interval for each of 100 random samples of equal size, we would expect 95 of those confidence intervals to contain the true population parameter.

(d) The margin of error at a 90% confidence level would be higher than 3%.

False. Since our z-score for a 90% confidence interval is smaller than for a 95% confidence interval and the margin of error is equal to the z-score times the standard error, the margin of error would also be smaller. This also makes intuitive sense since if we make our range of values within which we hope to capture our population parameter smaller, we would naturally be less confident that it would still contain our population parameter.

Exercise 6.12 Legalization of marijuana, Part I

The 2010 General Social Survey asked 1,259 US residents: “Do you think the use of marijuana should be made legal, or not?” 48% of the respondents said it should be made legal.

(a) Is 48% a sample statistic or a population parameter? Explain.

48% is a sample statistic since it is calculated based on a sample of 1,259 out of the total population of US residents.

(c) A critic points out that this 95% confidence interval is only accurate if the statistic follows a normal distribution, or if the normal model is a good approximation. Is this true for these data? Explain.

Sample proportions are well characterized by a nearly normal distribution when two conditions are satisfied: 1. the sample observations must be independent, and 2. we must have at least 10 successes and 10 failures in our sample. Assuming our sample was randomly selected, since it is less then 10% of our population and has more than 30 observations, it can be considered independent. Since 48% of the 1259 respondents approve we have \(.48 \times 1259\) or 604 successes and \(.52 \times 1259\) or 655 failures, both of which are much greater than our minimum of 10.

(d) A news piece on this survey’s findings states, “Majority of Americans think marijuana should be legalized.” Based on your confidence interval, is this news piece’s statement justified?

No, this news piece’s statement is not justified. Only if the true population proportion is at the very upper most limit of our confidence interval then 50.8 percent of Americans would agree that marijuana should be legalized. That would be only very slightly more than half. I would not call that a “majority”. the majority of our confidence interval falls below 50% approval, so we cannot assume that our population parameter is even more than half of Americans.

Exercise 6.20 Legalize Marijuana, Part II

As discussed in Exercise 6.12, the 2010 General Social Survey reported a sample where about 48% of US residents thought marijuana should be made legal. If we wanted to limit the margin of error of a 95% confidence interval to 2%, about how many Americans would we need to survey?

n <- SL95^2 * (p*(1-p))/.02^2
We would need to survey 2398 Americans to limit the margin of error of a 95% confidence interval to 2%.

Exercise 6.28 Sleep deprivation, CA vs. OR, Part I

According to a report on sleep deprivation by the Centers for Disease Control and Prevention, the proportion of California residents who reported insufficient rest or sleep during each of the preceding 30 days is 8.0%, while this proportion is 8.8% for Oregon residents. These data are based on simple random samples of 11,545 California and 4,691 Oregon residents. Calculate a 95% confidence interval for the difference between the proportions of Californians and Oregonians who are sleep deprived and interpret it in context of the data.

Check conditions

Since both samples are randomly selected, consist of less then 10% of the population and more than 30 cases we can independence is reasonable. 8% (and so also 92%) of 11,545 cases is more than 10 and 8.8% (and 91.2%) of 4,691 cases are also more than 10, so the success-failure condition is met. We can use the normal model for each proportion separately.

The two samples are independent of each other since they are residents of two different states, so both conditions are meet and we can use the normal model for \(\hat{p}_1 - \hat{p}_2\).

Calculate Standard Error and Confidence Interval
p1 <- .08
n1 <- 11545
p2 <- .088
n2 <- 4691
SE <- sqrt((p1*(1-p1))/n1 + (p2*(1-p2))/n2)
SL95 <- abs(qnorm(.025)) # for two-tailed test
Diff <- p2 - p1
ME <- SL95 * SE
lb <- round((Diff - ME)*100, 1)
ub <- round((Diff + ME)*100, 1)

95% Confidence Interval

Lower Bound: -0.1 - Upper Bound: 1.7

Interpretation

Since our confidence interval includes zero there is not enough statistical evidence to conclude that there is a difference in sleep deprivation between California and Oregon residents.

Exercise 6.44 Barking deer

Microhabitat factors associated with forage and bed sites of barking deer in Hainan Island, China were examined from 2001 to 2002. In this region woods make up 4.8% of the land, cultivated grass plot makes up 14.7% and deciduous forests makes up 39.6%. Of the 426 sites where the deer forage, 4 were categorized as woods, 16 as cultivated grassplot, and 61 as deciduous forests. The table below summarizes these data.

Woods Cultivated grassplot Deciduous forests Other Total
4 16 61 345 426
(a) Write the hypotheses for testing if barking deer prefer to forage in certain habitats over others.

\(H_0\): - Barking deer do not prefer one habitat over another for foraging. The observed counts reflect the same proportions as the island’s habitat proportions.

\(H_A\): - Barking Deer do prefer one or more habitats over other habitats for foraging.

(b) What type of test can we use to answer this research question?

A chi-square test can be used to answer this research question.

(c) Check if the assumptions and conditions required for this test are satisfied.
# Check expected outcomes

# woods 4.8% 
w <- round(426 * .048)
w
## [1] 20
# cultivated grass plots 14.7% 
cg <- round(426 * .147)
cg
## [1] 63
# deciduous forests 39.6%
df <- round(426 * .396)
df
## [1] 169
# other
o <- round(426 * (1-(.048 + .147 + .396)))
o
## [1] 174

The conditions are satisfied, since the expected counts are all more than 5 and each case is assumed to be independent.

(d) Do these data provide convincing evidence that barking deer prefer to forage in certain habitats over others? Conduct an appropriate hypothesis test to answer this research question.
chi <- (4-w)^2/w + (16-cg)^2/cg + (61-df)^2/df + (345-o)^2/o
chi
## [1] 284.933
options(scipen = 999)
pchisq(chi, 3, lower.tail = FALSE)
## [1] 0.0000000000000000000000000000000000000000000000000000000000001813092

Our test-statistic of 284.9329677 has a p-value so small that the probability that barking deer do not have a foraging preference is basically zero. We reject the null hypothesis in favor of the alternative hypothesis. Barking deer clearly have a preference for foraging habitat.

Exercise 6.48 Coffee and Depression

Researchers conducted a study investigating the relationship between caffeinated coffee consumption and risk of depression in women. They collected data on 50,739 women free of depression symptoms at the start of the study in the year 1996, and these women were followed through 2006. The researchers used questionnaires to collect data on caffeinated coffee consumption, asked each individual about physician-diagnosed depression, and also asked about the use of antidepressants. The table below shows the distribution of incidences of depression by amount of caffeinated coffee consumption.

Caffeinated coffee consumption

Clinical depression \(\leq\) 1 cup/week 2-6 cups/week 1 cup/day 2-3 cups/day \(\leq\) 4 cups/day Total
Yes 670 373 905 564 95 2,607
No 11,545 6,244 16,329 11,726 2,288 48,132
Total 12,215 6,617 17,234 12,290 2,383 50,739
(a) What type of test is appropriate for evaluating if there is an association between coffee intake and depression?

We can use a chi-square test for two-way tables to evaluate if there is an association between coffee intake and depression.

(b) Write the hypotheses for the test you identified in part (a).

\(H_0\): - There is no association between coffee intake and depression.

\(H_A\): - There is an association between coffee intake and depression.

(c) Calculate the overall proportion of women who do and do not suffer from depression.
# Women who suffer from depression
do <- 2607/50739
do
## [1] 0.05138059
# Women who do not suffer from depression
not <- 48132/50739
not
## [1] 0.9486194
(d) Identify the expected count for the highlighted cell, and calculate the contribution of this cell to the test statistic, i.e. (Observed − Expected)2/Expected.
expected <- round(6617 * do)
expected
## [1] 340
(373 - expected)^2/expected
## [1] 3.202941
(e) The test statistic is \(x^2 = 20.93\). What is the p-value?
df <- (2-1) * (5-1)
pval <- pchisq(20.93, df, lower.tail = FALSE)
pval
## [1] 0.0003269507
(f) What is the conclusion of the hypothesis test?

Our p-value 0.00033 is so small that we would reject the null hypothesis in favor of the alternative hypothesis. There is a statistical association between coffee consumption and depression.

  1. One of the authors of this study was quoted on the NYTimes as saying it was “too early to recommend that women load up on extra coffee” based on just this study. Do you agree with this statement? Explain your reasoning.

Yes, I would agree that it is “too early to recommend that women load up on extra coffee” based on just this study. Just because we know there is an association, we don’t know if there is a causal relationship and we cannot determine a causal relationship from this study since it is observational rather than experimental. Maybe women want to drink more coffee when they are happier rather than drinking more coffee causing them to be happier.