Confidence intervals

library(tidyverse)
library(openintro)
library(infer)

us_adults <- tibble(
  climate_change_affects = c(rep("Yes", 62000), rep("No", 38000))
)

ggplot(us_adults, aes(x = climate_change_affects)) +
  geom_bar() +
  labs(
    x = "", y = "",
    title = "Do you think climate change is affecting your local community?"
  ) +
  coord_flip()

us_adults %>%
  count(climate_change_affects) %>%
  mutate(p = n /sum(n))

## # A tibble: 2 × 3
##   climate_change_affects     n     p
##   <chr>                  <int> <dbl>
## 1 No                     38000  0.38
## 2 Yes                    62000  0.62

n <- 60
samp <- us_adults %>%
  sample_n(size = n)

Exercise 1. What percent of the adults in your sample think climate change affects their local community? Hint: Just like we did with the population, we can calculate the proportion of those in this sample who think climate change affects their local community.

samp %>%
  count(climate_change_affects) %>%
  mutate(p = n /sum(n))

## # A tibble: 2 × 3
##   climate_change_affects     n     p
##   <chr>                  <int> <dbl>
## 1 No                        20 0.333
## 2 Yes                       40 0.667

Exercise 2. Would you expect another student’s sample proportion to be identical to yours? Would you expect it to be similar? Why or why not?

samp %>%
  specify(response = climate_change_affects, success = "Yes") %>%
  generate(reps = 1000, type = "bootstrap") %>%
  calculate(stat = "prop") %>%
  get_ci(level = 0.95)

## # A tibble: 1 × 2
##   lower_ci upper_ci
##      <dbl>    <dbl>
## 1    0.550    0.767

Exercise 3. In the interpretation above, we used the phrase “95% confident”. What does “95% confidence” mean?

=> 95% confidence means that on a normal curve, 5% is left unshaded.

Exercise 4. Does your confidence interval capture the true population proportion of US adults who think climate change affects their local community? If you are working on this lab in a classroom, does your neighbor’s interval capture this value?

=> lt would be a similar value.I would have a closer value with a classmate than with the true population.

Exercise 5. Each student should have gotten a slightly different confidence interval. What proportion of those intervals would you expect to capture the true population mean? Why?

=> Different sample of US adults selected is why each student gets slightly different confidence interval. The general use of the 95% confidence interval certainty is why l would expect captures of the true population mean.

Exercise 6. Given a sample size of 60, 1000 bootstrap samples for each interval, and 50 confidence intervals constructed (the default values for the above app), what proportion of your confidence intervals include the true population proportion? Is this proportion exactly equal to the confidence level? If not, explain why. Make sure to include your plot in your answer.

=> 57/60 intervals are included in the true population proportion, which is exactly equal to the confidence level.

Exercise 7. Choose a different confidence level than 95%. Would you expect a confidence interval at this level to me wider or narrower than the confidence interval you calculated at the 95% confidence level? Explain your reasoning.

=>When I chose a confidence level greater than 95%, I would expect the bounds to be narrower, and therefore capturing the true population proportion.

Exercise 8. Using code from the infer package and data fromt the one sample you have (samp), find a confidence interval for the proportion of US Adults who think climate change is affecting their local community with a confidence level of your choosing (other than 95%) and interpret it.

samp %>%
  specify(response = climate_change_affects, success = "Yes") %>%
  generate(reps = 1000, type = "bootstrap") %>%
  calculate(stat = "prop") %>%
  get_ci(level = 0.85)

## # A tibble: 1 × 2
##   lower_ci upper_ci
##      <dbl>    <dbl>
## 1    0.567     0.75

Exercise 9. Using the app, calculate 50 confidence intervals at the confidence level you chose in the previous question, and plot all intervals on one plot, and calculate the proportion of intervals that include the true population proportion. How does this percentage compare to the confidence level selected for the intervals?

=> Using with 85% confidence level shows the percentage of intervals that include the true population proportion is lower. As we stated before it becomes narrower as confidence level decrease.

Exercise 10. Lastly, try one more (different) confidence level. First, state how you expect the width of this interval to compare to previous ones you calculated. Then, calculate the bounds of the interval using the infer package and data from samp and interpret it. Finally, use the app to generate many intervals and calculate the proportion of intervals that are capture the true population proportion.

Ran with 98% confidence level. We see the percentage of intervals that include the true population proportion is greater than when we ran as 85% level.

samp %>%
  specify(response = climate_change_affects, success = "Yes") %>%
  generate(reps = 1000, type = "bootstrap") %>%
  calculate(stat = "prop") %>%
  get_ci(level = 0.98)

## # A tibble: 1 × 2
##   lower_ci upper_ci
##      <dbl>    <dbl>
## 1    0.533      0.8

Exercise 11. Using the app, experiment with different sample sizes and comment on how the widths of intervals change as sample size changes (increases and decreases).

=> The higher confidence level, the wider the intervals become.If confidence interval gets smaller, the intervals get wider.

Exercise 12. Finally, given a sample size (say, 60), how does the width of the interval change as you increase the number of bootstrap samples. Hint: Does changing the number of bootstrap samples affect the standard error?

=> Used as an example bootstrap to 2000. It doesn’t affect the standart error. More precise estimates will lead from larger bootstrap.

R-pubs => https://rpubs.com/gunduzhazal/815752

Confidence intervals

Hazal Gunduz