library(tidyverse)
library(openintro)
library(infer)
library(ggplot2)

Exercise 1

What percent of the adults in your sample think climate change affects their local community? Hint: Just like we did with the population, we can calculate the proportion of those in this sample who think climate change affects their local community.

The percentage of adults in the sample who think climate change affects their local community is 70%.

set.seed(999)
us_adults <- tibble(
  climate_change_affects = c(rep("Yes", 62000), rep("No", 38000))
)
ggplot(us_adults, aes(x = climate_change_affects)) +
  geom_bar() +
  labs(
    x = "", y = "",
    title = "Do you think climate change is affecting your local community?"
  ) +
  coord_flip() 

us_adults %>%
  dplyr::count(climate_change_affects) %>%
  dplyr::mutate(p = n /sum(n))
## # A tibble: 2 × 3
##   climate_change_affects     n     p
##   <chr>                  <int> <dbl>
## 1 No                     38000  0.38
## 2 Yes                    62000  0.62
n <- 60
samp <- us_adults %>%
  sample_n(size = n)

samp %>%
  dplyr::count(climate_change_affects) %>%
  dplyr::mutate(p = n /sum(n))
## # A tibble: 2 × 3
##   climate_change_affects     n     p
##   <chr>                  <int> <dbl>
## 1 No                        18   0.3
## 2 Yes                       42   0.7

Exercise 2

Would you expect another student’s sample proportion to be identical to yours? Would you expect it to be similar? Why or why not?

No, I won’t expect another students sample proportion is not identical with mine. Probably it might be simple percentage because the random sampling process of taking a random value from a large population.

Exercise 3

In the interpretation above, we used the phrase “95% confident”. What does “95% confidence” mean?

The phrase “95% confident” means the true proportion parameters such as population proportion and means are likely to fall in the range of confidence interval.

Exercise 4

Does your confidence interval capture the true population proportion of US adults who think climate change affects their local community? If you are working on this lab in a classroom, does your neighbor’s interval capture this value?

Yes, a confidence interval captures the true population proportion of US adults who think climate change affects their local community when the data is accessed from the entire population and the interval is set using bootstrapping or another statistical methods provided that the analysis is performed correctly.

Exercise 5

Each student should have gotten a slightly different confidence interval. What proportion of those intervals would you expect to capture the true population mean? Why?

The proportion of intervals that contains true population proportion is expected to be 95% on average and the other 5% of confidence intervals might not contain the true parameters due to the inherent variability and randomness of the sampling process.

Exercise 6

Given a sample size of 60, 1000 bootstrap samples for each interval, and 50 confidence intervals constructed (the default values for the above app), what proportion of your confidence intervals include the true population proportion? Is this proportion exactly equal to the confidence level? If not, explain why. Make sure to include your plot in your answer.

set.seed(9999)
# Sample size of 60 entries without replacement.
sampled_entries <- sample_n(samp, size = 60)

# Compute p-hat: count the number that are "Yes," then divide by the sample size.
p_hat <- sum(sampled_entries$climate_change_affects == "Yes") / 1000
p_hat
## [1] 0.042
ggplot(sampled_entries, aes(x = climate_change_affects)) +
  geom_bar() +
  labs(
    x = "", y = "",
    title = "Observation over sample proportion"
  ) +
  coord_flip() 

Exercise 7

Choose a different confidence level than 95%. Would you expect a confidence interval at this level to me wider or narrower than the confidence interval you calculated at the 95% confidence level? Explain your reasoning.

The confidence interval is getting wider if a confidence level is greater than 95% that the true population parameters fall in the range. On the other hand, the narrower interval accepts less certainty that the margin of error will be reduced.

Exercise 8

Using code from the infer package and data from the one sample you have (samp), find a confidence interval for the proportion of US Adults who think climate change is affecting their local community with a confidence level of your choosing (other than 95%) and interpret it.

# Calculate the confidence interval
prop <- mean(samp$climate_change_affects == "Yes")  
se <- sqrt(prop * (1 - prop) / nrow(samp))
z_score <- qnorm((1 + 0.85) / 2)
margin_error <- z_score * se

lower <- prop - margin_error
upper <- prop + margin_error
# Print the confidence interval
cat("Confidence Interval: (", lower, ", ", upper, ")\n")
## Confidence Interval: ( 0.6148362 ,  0.7851638 )

Exercise 9

Using the app, calculate 50 confidence intervals at the confidence level you chose in the previous question, and plot all intervals on one plot, and calculate the proportion of intervals that include the true population proportion. How does this percentage compare to the confidence level selected for the intervals?

The 85% confidence level shows the percentage of intervals that include the true population proportion is lower. As wI stated before it become more narrow as confidence level decrease.

Exercise 10

Lastly, try one more (different) confidence level. First, state how you expect the width of this interval to compare to previous ones you calculated. Then, calculate the bounds of the interval using the infer package and data from samp and interpret it. Finally, use the app to generate many intervals and calculate the proportion of intervals that are capture the true population proportion.

# Calculate the confidence interval
prop <- mean(samp$climate_change_affects == "Yes")  
se <- sqrt(prop * (1 - prop) / nrow(samp))
z_score <- qnorm((1 + 0.99) / 2)
margin_of_error <- z_score * se

lower <- prop - margin_of_error
upper <- prop + margin_of_error
# Print the confidence interval
cat("Confidence Interval: (", lower, ", ", upper, ")\n")
## Confidence Interval: ( 0.5476119 ,  0.8523881 )

Exercise 11

Using the app, experiment with different sample sizes and comment on how the widths of intervals change as sample size changes (increases and decreases).

With larger sample sizes, confidence intervals is getting narrower, reducing the uncertainty and increasing precision. With smaller sample sizes, the wider in intervals, bigger in uncertainty and lesser in accuracy. Therefore, sample size has a significant impact on the width of precision of confidence intervals of a particular statistical analysis.

Exercise 12

Finally, given a sample size (say, 60), how does the width of the interval change as you increase the number of bootstrap samples. Hint: Does changing the number of bootstap samples affect the standard error?

The width of the interval remains the same as the number of bootstrap samples is changed. More bootstrap samples yields the lesser variability for getting more reliable standard errors.