library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.8 ✔ dplyr 1.0.9
## ✔ tidyr 1.2.0 ✔ stringr 1.4.1
## ✔ readr 2.1.2 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(openintro)
## Loading required package: airports
## Loading required package: cherryblossom
## Loading required package: usdata
library(shiny)
library(infer)
##
## Attaching package: 'infer'
##
## The following object is masked from 'package:shiny':
##
## observe
set.seed(3000) #set. seed() function in R is used to create reproducible results when writing code that involves creating variables that take on random values. By using the set. seed() function, you guarantee that the same random values are produced each time you run the code.
us_adults <- tibble(
climate_change_affects = c(rep("Yes", 62000), rep("No", 38000))
)
ggplot(us_adults, aes(x = climate_change_affects)) +
geom_bar() +
labs(
x = "", y = "",
title = "Do you think climate change is affecting your local community?"
) +
coord_flip()
us_adults %>%
count(climate_change_affects) %>%
mutate(p = n /sum(n))
## # A tibble: 2 × 3
## climate_change_affects n p
## <chr> <int> <dbl>
## 1 No 38000 0.38
## 2 Yes 62000 0.62
#n, stands for the number of times the experiment runs. The second variable, p, represents the probability of one specific outcome.
n <- 60
samp <- us_adults %>%
sample_n(size = n)
62% of the population believe this according to the tibble above.
I would expect another student's sample proportion to be similar but not the same. The set seed is used to create reproducible results when writing code that involves creating variables that take on random values. By using the set. seed() function, you guarantee that the same random values are produced each time you run the code, but this does not mean someone else will get my same exact values.
#This code will find the 95 percent confidence interval for proportion of US adults who think climate change affects their local community.
samp %>%
specify(response = climate_change_affects, success = "Yes") %>%
generate(reps = 1000, type = "bootstrap") %>%
calculate(stat = "prop") %>%
get_ci(level = 0.95)
## # A tibble: 1 × 2
## lower_ci upper_ci
## <dbl> <dbl>
## 1 0.500 0.75
The 95 percent confidence interval means you have a 5 percent chance of being wrong.
No, it does not because it does not account for the actual full population, and it would be the same for a neighbor/classmate.
95% confidence mean 95% confident that the population mean lies within the interval between a lower bound and an upper bound. A confidence interval only provides a plausible range of values.
The true population proportion is close to 0.9, it is somehow close to the 95% confidence level, but still the sample size is to small to see the true distribution. Exercise 6
Note: This Shiny App was taken from the template on the 606 class page.
I think the graph will be narrower because as the confidence interval decreases, the length of interval decreases. I used 0.80 as the confidence interval.
Exercise 7
samp %>%
specify(response = climate_change_affects, success = "Yes") %>%
generate(reps = 1000, type = "bootstrap") %>%
calculate(stat = "prop") %>%
get_ci(level = 0.5)
## # A tibble: 1 × 2
## lower_ci upper_ci
## <dbl> <dbl>
## 1 0.583 0.667
The lower confidence interval is 0.67 and the upper confidence interval is 0.73
The graph is very narrow because the confidence interval is 50 and the confidence level is 0.5. Exercise 9
I predict the graph's width to be very thin because I am using a confidence interval of 0.1. Exercise 10
I feel using different sample sizes (large and small) had no effect on the width of the graphs. Exercise 11: Pic 1, Exercise 11: Pic 2, Exercise 11: Pic 2
I am not sure about the answer to this, the width looks similar to all for me. I don't fully understand it. Exercise 12: Pic 1, Exercise 12: Pic 2, Exercise 12: Pic 3