Lab 3

The full assignment for this lab can be found here

In this lab, you will assume that \(\pi=62\%\) is the very true population proportion. In reality, we cannot observe this value, but for the purpose of this lab we will create this hypothetical population. We will then sample our data from our hypothetical population, exploring how samples vary from one to another.

To keep our computation simple, we will assume a total population size of 100,000 (even though that’s smaller than the population size of all US adults).

Exercise 1

Question: We can visualize the hypothetical distribution of the responses in the population using a bar plot. Recreate the plot below using the ggplot(), geom_bar() and labs() layers. To flip the x and y coordinates, add the coord_flip() layer.

Exercise 2

Question: Print the summary statistics to confirm we constructed the data frame correctly. Use the count function to show the numeric quantities and use mutate(p = n /sum(n)) to calculate the proportions in the population. What is the proportion of climate-believers in our hypothetical population?

Answer: The Proportion of climate change believers in our population is 62%.

Exercise 3

Question: Calculate the proportions like we did in the previous question and answer the following: (1) What percent of your sample are climate-believers? (2) How does this compare to the proportion of climate-believers in the population? Hint: Just like we did with the population, we can calculate the proportion of those in this sample who think climate change affects their local community.

Answer: This sample’s estimation the population of climate change believers is 61.6%, which is roughly equal to the the real population proportion

Exercise 4

Question: Create code to generate a second sample (call it samp_2). Answer the same questions as before, but this time with respect to samp_2. How do the two samples compare? Explain any difference you found between the two samples.

Answer: My second sample with proprtion of 56.6% has underestimated the proportion of climate change believers when compared to my first sample.

Exercise 5

Question: Run the proportion test (see code below) on the first sample samp_1, to estimate the proportion of climate-believers in the population. Now answer the following questions: (1) How does the estimation compare to the real proportion of climate-believers in the population? (2) What is the confidence interval associated with your estimation? (3) Is the proportion of climate-believers in the population contained within your confidence interval?

## No `p` argument was hypothesized, so the test will assume a null hypothesis `p
## = .5`.

Answer: Sample-1 estimation of the proportion and the real population proportion are almost identical. The confidence interval from my first sample indicating where the true value of the population proportion lies is between 48.1% - 73.6%. The population proportion lies within this values(62%).

Exercise 6

Question: This code will create 1000 bootstrapping samples from samp_1, and use those samples to find the 95 percent confidence interval for proportion of climate-believers. Run the code and compare your results with the proportion test we’ve run in the previous question.

Answer: Since the bootstrap sample was created multiple times the confidence interval becomes narrower.

Exercise 7

Question: Does your confidence interval capture the true population proportion of US adults who think climate change affects their local community? Now run the bootstrapping method on samp_2. How do your results compare?

Each time you run a sample, you would get different intervals. What proportion of those intervals would you expect to contain the true population mean?

Answer: Yes,my confidence interval capture the population proportion of 62%. I would expect 95% of those intervals to include the real population mean.

Exercise 8

Question: Given a sample size of 60, 1000 bootstrap samples for each interval, and 50 confidence intervals constructed (the default values for the above app), what proportion of your confidence intervals include the true population proportion? Is this proportion exactly equal to the confidence level? If not, explain why. Include an image of your plot with your the answer (to learn how to include an image in your RMarkdown, see this).

REPLACE THIS CAT IMAGE WITH AN IMAGE OF YOUR PLOT

Answer: Out of the 50 confidence intervals that were run, only 2 didn’t contain the real population proportion. Meaning 48 out of 50 times by two gives 96%, which is expected to include the true proportion.

Exercise 9

Question: Choose a different confidence level than 95%. Would you expect a confidence interval at this level to be wider or narrower than the confidence interval you calculated at the 95% confidence level? Explain your reasoning and confirm your using the app. What is the proportion of intervals that include the true population proportion? How does this percentage compare to the confidence level selected for the intervals? Include an image of your plot with your the answer. An image f 50 confidence interveals, sample size = 60

Answer: When the confidence level was changed to 90%, our uncertainty grows and the interval becomes wider. In my bootstrap above 11 out of the 50 confidence intervals failed to capture the true population mean. This equals to 78% of the intervals capturing the true mean which is much lower than the chosen 90% confidence level.

Exercise 10

Question: Using the app, experiment with different sample sizes and comment on how the widths of intervals change as sample size changes (increases and decreases). Include an image of your plot with your the answer.

An image f 50 confidence interveals, sample size = 60 Answer: When I increase the sample size to 1000 at a 95% confidence level, only one interval didn’t include the true population mean making the interval narrower.

Exercise 11

Question: Finally, given a sample size (say, 60), how does the width of the interval change as you increase the number of bootstrap samples? Include an image of your plot with your the answer. An image f 50 confidence interveals, sample size = 60

Answer: The above image is a sample size of 60 with 10,000 bootstraps. Generally, increasing the number of bootstrap samples tends to reduce the width of the confidence interval.