- In a 2010 Survey USA poll, 70% of the 119 respondents between the ages of 18 and 34 said they would vote in the 2010 general election for Prop 19, which would change California law to legalize marijuana and allow it to be regulated and taxed. At a 95% confidence level, this sample has an 8% margin of error. Based on this information, determine if the following statements are true or false, and explain your reasoning.
- We are 95% confident that between 62% and 78% of the California voters in this sample support Prop 19.
- We are 95% confident that between 62% and 78% of all California voters between the ages of 18 and 34 support Prop 19.
- If we considered many random samples of 119 California voters between the ages of 18 and 34, and we calculated 95% confidence intervals for each, 95% of them will include the true population proportion of 18-34 year old Californians who support Prop 19.
- In order to decrease the margin of error to 4%, we would need to quadruple (multiply by 4) the sample size.
- Based on this confidence interval, there is sufficient evidence to conclude that a majority of California voters between the ages of 18 and 34 support Prop 19.
#a False. Confidence interval is calculated to estimate the population proportion, not the sample proportion.
#b True. As stated above, confidence interval is calculated to estimate the population proportion. also 62% and 70% is within the margin of error.
#c True. By definition of confidence interval, the survey can be repeated multiple times and 95% of the time the results will match the results from a population.
#d True. By calculation:
ME <- 8
SE <- 1/sqrt(4)
ME * SE
## [1] 4
#e True. The sample meets the requered conditions for inference i.e. independent and large enough sample size. Also, 95% CI is above 50%.
- A survey on 1,509 high school seniors who took the SAT and who completed an optional web survey between April 25 and April 30, 2007 shows that 55% of high school seniors are fairly certain that they will participate in a study abroad program in college.
- Is this sample a representative sample from the population of all high school seniors in the US? Explain your reasoning.
- Let’s suppose the conditions for inference are met. Even if your answer to part (a) indicated that this approach would not be reliable, this analysis may still be interesting to carry out (though not report). Construct a 90% confidence interval for the proportion of high school seniors (of those who took the SAT) who are fairly certain they will participate in a study abroad program in college, and interpret this interval in context.
- What does “90% confidence” mean?
- Based on this interval, would it be appropriate to claim that the majority of high school seniors are fairly certain that they will participate in a study abroad program in college?
#a No. Because the web survey is optional which means a lot of student might opt out. Also, not all exams take place on the given period and not all students take SAT exam.
#b We are 90% confident that between 53.36% and 56.64% of all high school seniors in US are fairly certain they will participate in a study abroud program in college.
n <- 1509
p <- 0.55
SE <- sqrt(p * (1-p) / n)
z <- qnorm(0.9)
c (round((p - z * SE) * 100, 2), round((p + z * SE) * 100, 2)) # lower and upper tails
## [1] 53.36 56.64
#c By definition, a confidence interval is a plausible range of values for the population parameter i.e. for a 90% confidence interval, we are 90% confident that the confidence interval captured the true parameter.
#d Since 55% falls within our 90% confidence interval, it would be appropraite to claim that the majority of high seniors are fairly certain that they will participate in a study abroad program in college.
- Exercise 6.13 presents the results of a poll evaluating support for the health care public option plan in 2009. 70% of 819 Democrats and 42% of 783 Independents support the public option.
- Calculate a 95% confidence interval for the difference between (pD - pI) and interpret it in this context. We have already checked conditions for you.
- True or false: If we had picked a random Democrat and a random Independent at the time of this poll, it is more likely that the Democrat would support the public option than the Independent.
#a We are 95% confident that the proportion of Democrats who support the plan is 24% to 32% higher than the proportion of Independents who do,
pD <- 0.7
nD <- 819
pI <- 0.42
nI <- 783
SE <- sqrt((pD * (1-pD) / nD) + (pI * (1-pI) / nI))
z <- qnorm(0.95)
ME <- SE * z
point_estimate <- pD - pI
c((round((point_estimate - ME) * 100, 2)), (round((point_estimate + ME) * 100, 2)))
## [1] 24.08 31.92
#b True. If we had picked a random Democrat and a random Independent at the time of this poll, it 24% to 32% more likely that the Democrat would support the public option than the Independent.
- Rock-paper-scissors is a hand game played by two or more people where players choose to sign either rock, paper, or scissors with their hands. For your AP Statistics class project, you want to evaluate whether players choose between these three options randomly, or if certain options are favored above others. You ask two friends to play rock-paper-scissors and count the times each option is played. The following table summarizes the data: Rock - 43, Paper - 21, Scissors - 35.
Use these data to evaluate whether players choose between these three options randomly, or if certain options are favored above others. Make sure to clearly outline each step of your analysis, and interpret your results in context of the data and the research question.
# Null Hypothesis: Rock = Paper = Scissors
# Alternate Hypothesis: There is a difference in at least one pair of options
rock_o <- 43
paper_o <- 21
scissors_o <- 35
game <- c(rock_o, paper_o, scissors_o)
chisq.test(game)
##
## Chi-squared test for given probabilities
##
## data: game
## X-squared = 7.5152, df = 2, p-value = 0.02334
# Since p-value is less than 0.5, we reject the null hypothesis i.e. certain options were favored above others by chance.
- The table below summarizes a data set we first encountered in Exercise 6.29 that examines the responses of a random sample of college graduates and non-graduates on the topic of oil drilling. Complete a chi-square test for these data to check whether there is a statistically significant difference in responses from college graduates and non-graduates.
# Null Hypothesis: graduating college and supporting oil drilling are indepedent
# Alternative Hypothesis: graduating college and supporting oil drilling are dependent
values <- c(154, 132, 180, 126, 104, 131)
responses <- c("Support", "Oppose", "Do not know")
groups <- c("Grad", "nGrad")
groups_responses <- list(responses, groups)
offshore_drilling <- matrix(values, 3, 2, byrow = TRUE, dimnames = groups_responses)
# Since p-value is less than 0.5, we reject the null hypothesis i.e. graduating college and supporting oil drilling are dependent.
chisq.test(offshore_drilling)
##
## Pearson's Chi-squared test
##
## data: offshore_drilling
## X-squared = 11.461, df = 2, p-value = 0.003246