`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
exercise 7
null_dist %>%get_p_value(obs_stat = obs_diff, direction ="two_sided")
Warning: Please be cautious in reporting a p-value of 0. This result is an approximation
based on the number of `reps` chosen in the `generate()` step.
ℹ See `get_p_value()` (`?infer::get_p_value()`) for more information.
# A tibble: 1 × 1
p_value
<dbl>
1 0
Exercise 8
The warning message is p value of 0 is an approximation of the population mean.
obs_height_dist <- yrbss1 |>specify(height ~ physical_3plus) |>hypothesise(null ="independence") |>calculate(stat ="diff in means", order =c("yes", "no"))
Message: The independence null hypothesis does not inform calculation of the
observed statistic (a difference in means) and will be ignored.
obs_height_dist
Response: height (numeric)
Explanatory: physical_3plus (factor)
Null Hypothesis: independence
# A tibble: 1 × 1
stat
<dbl>
1 0.0376
ggplot(null_height_dist, aes(stat))+geom_histogram() +geom_vline(xintercept =pull(obs_height_dist), color ="purple")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
null_height_dist %>%get_p_value(obs_stat = obs_height_dist, direction ="two_sided")
Warning: Please be cautious in reporting a p-value of 0. This result is an approximation
based on the number of `reps` chosen in the `generate()` step.
ℹ See `get_p_value()` (`?infer::get_p_value()`) for more information.
# A tibble: 1 × 1
p_value
<dbl>
1 0
calculate exact proportion of the sample that responded this way
##The proportion who received a callback in the dataset was 0.080.*
The 95% confidence interval can be calculated as the sample proportion plus or minus two standard errors of the sample proportion
##The bootstrap is done with the function: specify()
We do this many times to create many bootstrap replicate data sets.
##Do this with the function generate()
Next, for each replicate, we calculate the sample statistic, in this case: the proportion of respondents that said “yes” to receiving callbacks.
Do this with the function calculate()
##The standard deviation of the stat variable in this data frame (the bootstrap distribution) is the bootstrap standard error and it can be calculated using the summarize() function.
We can use this value, along with our point estimate, to roughly calculate a 95% confidence interval:
##\(\hat{p} \pm z^*se\)
We are 95% confident that the true proportion of applicants receiving callbacks is between 7.28% and 8.81%.
The normal distribution for confidence interval
##Another option for calculating the CI is by estimating it by using the Normal Distribution (the bell curve)
If
##1. observations are independent ##2. n is large (S-F condition is met)