Data606

Problem Set 1 : 2010 Healthcare Law :

(a) We are 95% confident that between 43% and 49% of Americans in this sample support the decision of the U.S. Supreme Court on the 2010 healthcare law.

Ans) False, the confidence interval tells you that the proportion of the population is within that range based on the spot estimate and margin of error of the sample.

(b) We are 95% confident that between 43% and 49% of Americans support the decision of the U.S. Supreme Court on the 2010 healthcare law.

Ans) True, as above the confidence interval tells you about the population not just the sample.

(c) If we considered many random samples of 1,012 Americans, and we calculated the sample proportions of those who support the decision of the U.S. Supreme Court, 95% of those sample proportions will be between 43% and 49%.

Ans) False, 95% of the sample’s confidence intervals would capture the true proportion of the population which the first sample claims is between 43% and 49% .

(d) The margin of error at a 90% confidence level would be higher than 3%.

Ans) False. Decreasing the confidence interval will narrow the width of the interval and hence decrease the margin of error.

Problem set 2: Legalization of marijuana, Part I:

(a) Is 48% a sample statistic or a population parameter? Explain.

Ans) It is a sample statistic because 48% of the 1,259 US residents and not the total population of US.

(b) Construct a 95% confidence interval for the proportion of US residents who think marijuana should be made legal, and interpret it in the context of the data.

Ans)

n <- 1259
p <- .48
z <- 1.96
SE <- sqrt((p*(1-p))/n)
#lower limit
(lower_limit <- p - (z * SE))

## [1] 0.4524028

# upper limit
(upper_limit <- p + (z * SE))

## [1] 0.5075972

Interval is 45.2% to 50.7%.

(c) A critic points out that this 95% confidence interval is only accurate if the statistic follows a normal distribution, or if the normal model is a good approximation. Is this true for these data? Explain.

Ans) Yes, it appears so. There are two conditions for confirming that a sampling distribution of a population proportion estimate is nearly normal:

1. The sampling observations are independent - it seems so.

2. Passes the success-failure conditions, meaning there are at least 10 successes and 10 failures. As our sample of > 1200 is nearly evenly split, the sample fits this criteria.

(d) A news piece on this survey’s findings states, “Majority of Americans think marijuana should be legalized.” Based on your confidence interval, is this news piece’s statement justified?

Ans) It wouldn’t be fair to say a majority support legalization unless the entire confidence interval were about 50%. We could say a majority might support legalizaiton.

Problem set 3:

Legalize Marijuana, Part II: As discussed in Exercise above, the 2010 General Social Survey reported a sample where about 48% of US residents thought marijuana should be made legal. If we wanted to limit the margin of error of a 95% confidence interval to 2%, about how many Americans would we need to survey

Ans)

p<- 0.48
mu <- 0.02
round(n <- 1.96^2 * p*(1-p)/mu^2)

## [1] 2397

We need to survey 2,398 Americans.

Problem set 4:

Sleep deprivation, CA vs. OR, Part I: According to a report on sleep deprivation by the Centers for Disease Control and Prevention, the proportion of California residents who reported insuffient rest or sleep during each of the preceding 30 days is 8.0%, while this proportion is 8.8% for Oregon residents.These data are based on simple random samples of 11,545 California and 4,691 Oregon residents. Calculate a 95% confidence interval for the difference between the proportions of Californians and Oregonians who are sleep deprived and interpret it in context of the data.

Ans)

pooled <- (.08*11545 + .088*4691)/(11545 + 4691)
SE <- sqrt(pooled/11545 + pooled/4691)
margin <- 1.96 * SE
lower <- .008 - margin
upper <- .008 + margin
(interval<-c(lower,upper))

## [1] -0.001736344  0.017736344

Ci is -.0015 to .0175. We can say that with a 95% confidence level that the proportions are not statistically different between California and Oregon.

Problem set 5: Barking deer:

(a) Write the hypotheses for testing if barking deer prefer to forage in certain habitats over others.

Ans) HO:The proportion of forage sites mirrors the distribution of the different habitats

HA: The proportion of forage sites does not mirror the distribution of the different habitats.

(b) What type of test can we use to answer this research question?

Ans) We can use a chi-square test.

(c) Check if the assumptions and conditions required for this test are satisfied.

Ans) 1) independence and

2) at least 5 expected cases for each expected case. While we only observed 4 woods forage sites, the expected rate would be higher, so it works.

(d) Do these data provide convincing evidence that barking deer pre- fer to forage in certain habitats over others? Conduct an appro- priate hypothesis test to answer this research question.

Ans)

obs <- c(4,16,67,345)
ev <- round(426*c(.048,.147,.396,.409),2)
chi <- sum((obs-ev)^2/ev)
(p<- 1 - pchisq(chi, 3))

## [1] 0

Since the p-value is near-zero (and below the significance level), we can reject the null hypothesis.Hence the alternative is true.

Problem set 6 :Coffee and Depression :

(a) What type of test is appropriate for evaluating if there is an association between coffee intake and depression?

Ans) we can use a chi-squared test

(b) Write the hypotheses for the test you identified in part (a).

Ans) H0 : There isn’t any association between coffee and depression.

HA : There is an association between coffee and depression.

(c) Calculate the overall proportion of women who do and do not suffer from depression.

Ans)

(WOMEN_dep <- 2607/50739)

## [1] 0.05138059

(WOMEN_not_dep <-  48132/50739)

## [1] 0.9486194

The propotion of women suffer from depression is 0.05 and the proportion of women do not suffer from depression is 0.94.

(e) The test statistic is χ2 = 20.93. What is the p-value?

Ans)

chisq <- 20.93
df <-  (5-1)*(2-1)
(p <- 1-pchisq(chisq, df))

## [1] 0.0003269507

Data606_hw6

Vijaya Cherukuri

10/16/2019

Problem Set 1 : 2010 Healthcare Law :

(a) We are 95% confident that between 43% and 49% of Americans in this sample support the decision of the U.S. Supreme Court on the 2010 healthcare law.

Ans) False, the confidence interval tells you that the proportion of the population is within that range based on the spot estimate and margin of error of the sample.

(b) We are 95% confident that between 43% and 49% of Americans support the decision of the U.S. Supreme Court on the 2010 healthcare law.

Ans) True, as above the confidence interval tells you about the population not just the sample.

(c) If we considered many random samples of 1,012 Americans, and we calculated the sample proportions of those who support the decision of the U.S. Supreme Court, 95% of those sample proportions will be between 43% and 49%.

Ans) False, 95% of the sample’s confidence intervals would capture the true proportion of the population which the first sample claims is between 43% and 49% .

(d) The margin of error at a 90% confidence level would be higher than 3%.

Ans) False. Decreasing the confidence interval will narrow the width of the interval and hence decrease the margin of error.

Problem set 2: Legalization of marijuana, Part I:

(a) Is 48% a sample statistic or a population parameter? Explain.

Ans) It is a sample statistic because 48% of the 1,259 US residents and not the total population of US.

(b) Construct a 95% confidence interval for the proportion of US residents who think marijuana should be made legal, and interpret it in the context of the data.

Ans)

Interval is 45.2% to 50.7%.

(c) A critic points out that this 95% confidence interval is only accurate if the statistic follows a normal distribution, or if the normal model is a good approximation. Is this true for these data? Explain.

Ans) Yes, it appears so. There are two conditions for confirming that a sampling distribution of a population proportion estimate is nearly normal:

1. The sampling observations are independent - it seems so.

2. Passes the success-failure conditions, meaning there are at least 10 successes and 10 failures. As our sample of > 1200 is nearly evenly split, the sample fits this criteria.

(d) A news piece on this survey’s findings states, “Majority of Americans think marijuana should be legalized.” Based on your confidence interval, is this news piece’s statement justified?

Ans) It wouldn’t be fair to say a majority support legalization unless the entire confidence interval were about 50%. We could say a majority might support legalizaiton.

Problem set 3:

Ans)

We need to survey 2,398 Americans.

Problem set 4:

Ans)

Ci is -.0015 to .0175. We can say that with a 95% confidence level that the proportions are not statistically different between California and Oregon.

Problem set 5: Barking deer:

(a) Write the hypotheses for testing if barking deer prefer to forage in certain habitats over others.

Ans) HO:The proportion of forage sites mirrors the distribution of the different habitats

HA: The proportion of forage sites does not mirror the distribution of the different habitats.

(b) What type of test can we use to answer this research question?

Ans) We can use a chi-square test.

(c) Check if the assumptions and conditions required for this test are satisfied.

Ans) 1) independence and

2) at least 5 expected cases for each expected case. While we only observed 4 woods forage sites, the expected rate would be higher, so it works.

(d) Do these data provide convincing evidence that barking deer pre- fer to forage in certain habitats over others? Conduct an appro- priate hypothesis test to answer this research question.

Ans)

Since the p-value is near-zero (and below the significance level), we can reject the null hypothesis.Hence the alternative is true.

Problem set 6 :Coffee and Depression :

(a) What type of test is appropriate for evaluating if there is an association between coffee intake and depression?

Ans) we can use a chi-squared test

(b) Write the hypotheses for the test you identified in part (a).

Ans) H0 : There isn’t any association between coffee and depression.

HA : There is an association between coffee and depression.

(c) Calculate the overall proportion of women who do and do not suffer from depression.

Ans)

The propotion of women suffer from depression is 0.05 and the proportion of women do not suffer from depression is 0.94.

(e) The test statistic is χ2 = 20.93. What is the p-value?

Ans)

The p value is .00033.

(f) What is the conclusion of the hypothesis test?

Ans) The p value is less than .05 so we can reject the null hypothesis that there is no association between coffee and depression.

(g) One of the authors of this study was quoted on the NYTimes as saying it was “too early to recommend that women load up on extra coffee” based on just this study.64 Do you agree with this statement? Explain your reasoning.

Ans) Yes I agree, the study only establishes statistical significance and not practical significance. I would say its a weak relationship between coffee consumption and depression among women.