library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5 v purrr 0.3.4
## v tibble 3.1.6 v dplyr 1.0.7
## v tidyr 1.2.0 v stringr 1.4.0
## v readr 2.1.1 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(openintro)
## Loading required package: airports
## Loading required package: cherryblossom
## Loading required package: usdata
library(infer)
us_adults <- tibble(
climate_change_affects = c(rep("Yes", 62000), rep("No", 38000)))
us_adults %>%
count(climate_change_affects) %>%
mutate(p = n /sum(n))
## # A tibble: 2 x 3
## climate_change_affects n p
## <chr> <int> <dbl>
## 1 No 38000 0.38
## 2 Yes 62000 0.62
n <- 60
samp <- us_adults %>%
sample_n(size = n)
samp %>%
specify(response = climate_change_affects, success = "Yes") %>%
generate(reps = 1000, type = "bootstrap") %>%
calculate(stat = "prop") %>%
get_ci(level = 0.95)
## # A tibble: 1 x 2
## lower_ci upper_ci
## <dbl> <dbl>
## 1 0.383 0.617
The percent of adult who think climate changes affects their local community in my sample size is 65% agrees while 35% disagrees.
## Exercise 1:
set.seed(9)
samp %>%
count(climate_change_affects) %>%
mutate(p=n/sum(n))
## # A tibble: 2 x 3
## climate_change_affects n p
## <chr> <int> <dbl>
## 1 No 30 0.5
## 2 Yes 30 0.5
I expect the sample size of my classmate to be smiliar since out of that random sample 62% within that sample agree with it.
Exercise 3 when we say we are 95% confident this means that we are 95% confident that the true population mean is within range of the two intervals..
My Confidence Interval is between .533 and .767 It does capture the true population proportion since we claimed that 95% confident that the true mean lies between these two values and since we know the true population proportion is 62% we can see that the two bounds are in between 62.
set.seed(9)
samp %>%
count(climate_change_affects) %>%
mutate(p=n/sum(n))
## # A tibble: 2 x 3
## climate_change_affects n p
## <chr> <int> <dbl>
## 1 No 30 0.5
## 2 Yes 30 0.5
samp %>%
specify(response = climate_change_affects, success = "Yes") %>%
generate(reps = 1000, type = "bootstrap") %>%
calculate(stat = "prop") %>%
get_ci(level = 0.95)
## # A tibble: 1 x 2
## lower_ci upper_ci
## <dbl> <dbl>
## 1 0.383 0.617
Since each student has gotten slightly different confidence interval. We expect the proportion of these intervals to lie somewhere before and after 62%
According to my chart 95% of my confidence interval include the true population proportion.Yes it does cause looking at the plot I can see that most of the confidence intervals calculated are all within range of the true population proportion.
I believe it would be narrower since we are more uncertain of where the true population proportion lies between the confidence intervals.. thus we have a higher chance of not capturing the value within the bounds
Using code from the infer package and data fromt the one sample you have (samp), find a confidence interval for the proportion of US Adults who think climate change is affecting their local community with a confidence level of your choosing (other than 95%) and interpret it.
We can be 90% certain that the true population mean is within range from between .56 and .76 and that this indicates that people believe that climate change is affecting their local community.
set.seed(2)
samp %>%
count(climate_change_affects) %>%
mutate(p=n/sum(n))
## # A tibble: 2 x 3
## climate_change_affects n p
## <chr> <int> <dbl>
## 1 No 30 0.5
## 2 Yes 30 0.5
samp %>%
specify(response = climate_change_affects, success = "Yes") %>%
generate(reps = 1000, type = "bootstrap") %>%
calculate(stat = "prop") %>%
get_ci(level = 0.90)
## # A tibble: 1 x 2
## lower_ci upper_ci
## <dbl> <dbl>
## 1 0.4 0.6
I calculated everything is the same the sample size is 60 and the re sampling is 1000 and I made the confidence Interval 90%. Looking at the data I can see that out of the 50 intervals constructed only 3 of them were out of range. So the proportion of intervals that include the true population proportion is: 47/50 or approximately 94%.This proportion is pretty close to the confidence level we constructed.
samp and interpret it. Finally, use the app to generate many intervals and calculate the proportion of intervals that are capture the true population proportion.I’m gonna compute a 85% confidence interval, I believe that this interval will be thinner than the previous confidence interval. We cans say that 85% certain that the true population proportion of people who believes that climate change is affecting their community is within .58 and .75. Using the app I merely changed the confidence interval to 85% the proportion of intervals were 38/50 was the true population within range which is 76%.
set.seed(1)
samp %>%
count(climate_change_affects) %>%
mutate(p=n/sum(n))
## # A tibble: 2 x 3
## climate_change_affects n p
## <chr> <int> <dbl>
## 1 No 30 0.5
## 2 Yes 30 0.5
samp %>%
specify(response = climate_change_affects, success = "Yes") %>%
generate(reps = 1000, type = "bootstrap") %>%
calculate(stat = "prop") %>%
get_ci(level = 0.85)
## # A tibble: 1 x 2
## lower_ci upper_ci
## <dbl> <dbl>
## 1 0.417 0.6
I experimented with samples sizes of 70,80 and 100 and upon calculating different intervals I found that the proportion were more or less consistent. The bigger the sample sizes changes the less width the confidence intervals and the smaller the sample size the bigger the width of the intervals.
The width of the interval grows tighter as we increase the bootstrap samples since the standard error grows smaller when we increase the number of bootstamp samples.