#bar plot of response distributionggplot(us_adults, aes(x = climate_change_affects)) +geom_bar() +labs(x ="", y ="",title ="Do you think climate change is affecting your local community?" ) +coord_flip()
#summary statistics of data frameus_adults %>%count(climate_change_affects) %>%mutate(p = n /sum(n))
# A tibble: 2 × 3
climate_change_affects n p
<chr> <int> <dbl>
1 No 38000 0.38
2 Yes 62000 0.62
n <-60samp <- us_adults %>%sample_n(size = n)
Data
samp %>%count(climate_change_affects) %>%mutate(p = n /sum(n))
# A tibble: 2 × 3
climate_change_affects n p
<chr> <int> <dbl>
1 No 24 0.4
2 Yes 36 0.6
#Answer 1: 45% of the adults in my sample don't think that climate change affects their local community while 55% of the adults in my sample do think that climate change affects their local community.
#Answer 2: No, I would expect to for another student's sample proportion tobe different. I would expect it to be somewhat similar, as we're using a sample size of 60 which isn't horribly small, but it isn't the largest sample size either. Of course, it is very unlikely for our proportions to be the same as we're using a sample size of 60 from the population.
#calculating 95% confidence interval for proportion of US adults who think climate change affects their local communitysamp %>%specify(response = climate_change_affects, success ="Yes") %>%generate(reps =1000, type ="bootstrap") %>%calculate(stat ="prop") %>%get_ci(level =0.95)
#Question 1 (3): 95% confidence means that 95% of my confidence intervals will contain the true population mean.
#Question 1 (4): Yes, my confidence interval does capture the true population propotion of 0.62.
#Question 2 (5): I would imagine most of them would capture the true population means because we took 1000 repetitions in our bootstrapping, which is a pretty large size so it's less likely we'll have errors in our confidence intervals due to chance.
#Question 1 (6): 43/50, or 86% of the confidence intervals included the true proportion level. This is lower than the confidence level of 95%. This is due to variability in random sampling with replacement when making our bootstrap samples.
More Practice
#Question 1 (7): Choosing a confidence level of 85%, I would expect the confidence interval to be smaller since you will be less confident with a more specific range.
#Question 2 (8): For a confidence interval of 85%, there is a range of 48.3% to 66.8% probability that the mean proportion of US Adults who think climate change is affecting their local community.samp %>%specify(response = climate_change_affects, success ="Yes") %>%generate(reps =1000, type ="bootstrap") %>%calculate(stat ="prop") %>%get_ci(level =0.85)
#Question 3 (9): 43/50, or 86% included the true proportion levels. This is extremely close to the selected confidence interval of 85%.
#Question 4 (10): I will be using a 50% confidence interval and I expect for the width of this interval to be much smaller. Using the infer package, the range is 55% to 61.7%. This means that for a confidence interval of 50%, there is a range of 55.0% to 61.7% probability that the mean proportion of US Adults who think climate change is affecting their local community. Using the app to generate 50 confidence intervals, 24/50 or 48% did contain the true population proportion, which is extremely close to the desired confidence interval of 50%. samp %>%specify(response = climate_change_affects, success ="Yes") %>%generate(reps =1000, type ="bootstrap") %>%calculate(stat ="prop") %>%get_ci(level =0.50)