set.seed(500)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6     ✔ purrr   0.3.4
## ✔ tibble  3.1.8     ✔ dplyr   1.0.9
## ✔ tidyr   1.2.0     ✔ stringr 1.4.1
## ✔ readr   2.1.2     ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(openintro)
## Loading required package: airports
## Loading required package: cherryblossom
## Loading required package: usdata
library(infer)
us_adults <- tibble(
  climate_change_affects = c(rep("Yes", 62000), rep("No", 38000))
)
ggplot(us_adults, aes(x = climate_change_affects)) +
  geom_bar() +
  labs(
    x = "", y = "",
    title = "Do you think climate change is affecting your local community?"
  ) +
  coord_flip() 

us_adults %>%
  count(climate_change_affects) %>%
  mutate(p = n /sum(n))
## # A tibble: 2 × 3
##   climate_change_affects     n     p
##   <chr>                  <int> <dbl>
## 1 No                     38000  0.38
## 2 Yes                    62000  0.62
n <- 60
samp <- us_adults %>%
  sample_n(size = n)

Exercise 1

62% of the adults in my sample think climate change affects their local community.

us_adults %>%
  count(climate_change_affects) %>%
  mutate(p = n /sum(n))
## # A tibble: 2 × 3
##   climate_change_affects     n     p
##   <chr>                  <int> <dbl>
## 1 No                     38000  0.38
## 2 Yes                    62000  0.62

Exercise 2

I would expect another student’s sample proportion to be similar but not identical to mine because sample proportions can vary from sample to sample by taking smaller samples from the population. The sample is randomly selected so the sample result would always come out a little different everytime you run the sample code.

samp %>%
  specify(response = climate_change_affects, success = "Yes") %>%
  generate(reps = 1000, type = "bootstrap") %>%
  calculate(stat = "prop") %>%
  get_ci(level = 0.95)
## # A tibble: 1 × 2
##   lower_ci upper_ci
##      <dbl>    <dbl>
## 1    0.517    0.767

Exercise 3

95% confidence mean 95% confident that the population mean lies within the interval between a lower bound and an upper bound. A confidence interval only provides a plausible range of values.

Exercise 4

Yes confidence interval capture the true population proportion of US adults who think climate change affects their local community. If I was working on this lab in a classroom, my neighbor’s interval would have gotten a slightly different confidence interval. The confidence interval is thus a statement about the estimation procedure and not about the specific interval generated in the sample

Exercise 5

The confidence interval ranging from .55 to .783 would be expected to cover the true population proportion 95% of the time because a confidence interval only provides a plausible range of values. While we might say other values are implausible based on the data, this does not mean they are impossible.

Exercise 8

Using the code from the infer package and data from the one sample I have (samp), I chose a confidence level of 99% for the proportion of US Adults who think climate change is affecting their local community. We are 99% level of confident that the proportion of US adults who think climate change affects their local community is between .48 adn .81.

samp %>%
  specify(response = climate_change_affects, success = "Yes") %>%
  generate(reps = 1000, type = "bootstrap") %>%
  calculate(stat = "prop") %>%
  get_ci(level = 0.99)
## # A tibble: 1 × 2
##   lower_ci upper_ci
##      <dbl>    <dbl>
## 1    0.483    0.817