Exercise 1

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6      ✔ purrr   0.3.4 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.0      ✔ stringr 1.4.1 
## ✔ readr   2.1.2      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(openintro)
## Loading required package: airports
## Loading required package: cherryblossom
## Loading required package: usdata
library(infer)

us_adults <- tibble(
  climate_change_affects = c(rep("Yes", 62000), rep("No", 38000))
)


n <- 60
samp <- us_adults %>%
  sample_n(size = n)

samp %>%
  count(climate_change_affects) %>%
  mutate(p = n /sum(n))
## # A tibble: 2 × 3
##   climate_change_affects     n     p
##   <chr>                  <int> <dbl>
## 1 No                        29 0.483
## 2 Yes                       31 0.517

Because the sample was taken from our population, I would assume that the percentage would reflect a proportion similar to that of the population. Based on the calculation, the proportion is similar to the population proportion with slightly more No’s than were expected.

Exercise 2

I would expect another student’s sample to be similar to mine but not identical. Because it is a random choosing of the population there is bound to be some deviations.

Exercise 3

samp %>%
  specify(response = climate_change_affects, success = "Yes") %>%
  generate(reps = 1000, type = "bootstrap") %>%
  calculate(stat = "prop") %>%
  get_ci(level = 0.95)
## # A tibble: 1 × 2
##   lower_ci upper_ci
##      <dbl>    <dbl>
## 1    0.400    0.633

A confidence interval of 95% means that you are 95% certain that the true population’s proportion lies within that particular range.

Exercise 4

The confidence interval captures the population mean that I’ve previously calculated. My neighbors interval would capture this data 95% of the time.

Exercise 5

I’d assume that 95% of those intervals would capture the true population mean. As each of the subsets are randomized there is a chance for anomalies to occur.

Exercise 6

n <- 60
samp <- us_adults %>%
  sample_n(size = n)

Exercise 7

If I chose a higher confidence level than 95%, I would expect a wider confidence interval than the one acheived at 95% and vice versa. My reasoning is that if we had a confidence level of 100% for example, that means our interval would have to contain all possible values of the populations statistic.

Exercise 8

samp %>%
  specify(response = climate_change_affects, success = "Yes") %>%
  generate(reps = 1000, type = "bootstrap") %>%
  calculate(stat = "prop") %>%
  get_ci(level = .1)
## # A tibble: 1 × 2
##   lower_ci upper_ci
##      <dbl>    <dbl>
## 1     0.55    0.567

Based on this calculation, I am 10% confident that the actual population proportion is within the interval of .58 and .6

Exercise 9

Out of 50 confidence intervals collected, there were only 4 that include the true population proportion. Based on the confidence interval of 10%, I was expecting around 5 of the intervals to have the actual population proportion so that result makes sense.

Exercise 10

I will place the confidence interval to be 50%. I’d expect the width’s to be wider than 10% but smaller than 95%. The app shows 21 intervals that capture the true population proportion.

samp %>%
  specify(response = climate_change_affects, success = "Yes") %>%
  generate(reps = 1000, type = "bootstrap") %>%
  calculate(stat = "prop") %>%
  get_ci(level = .5)
## # A tibble: 1 × 2
##   lower_ci upper_ci
##      <dbl>    <dbl>
## 1    0.517    0.617

Exercise 11

The widths of the intervals increase as the confidence interval increases and decreases if the C.I. decreases.

Exercise 12

Playing around with the bootstrap samples, doing multiple trials I noticed larger deviations in number of expected intervals that held the true proportion. For example, in testing only 50 bootstrap samples, I receieved in one trial 33/50 containing the proportion and in another, 14/50 with a 50% confidence interval.