Harold Nelson
2024-07-18
I want to show you how to create a bootstrap distribution without using the infer package. Going through this should give you a clearer picture of the process.
We’ll use the dataframe age_at_mar from the openintro package.
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## Loading required package: airports
## Loading required package: cherryblossom
## Loading required package: usdata
## Rows: 5,534
## Columns: 1
## $ age <int> 32, 25, 24, 26, 32, 29, 23, 23, 29, 27, 23, 21, 29, 40, 22, 20, 31…
I’ll use the standard deviation as an example statistic.
# How big a bootstrap distribution?
boot_size = 1000
# Create a vector to hold the results
statistics = rep(0,boot_size)
for(i in 1:boot_size){
samp = sample(age_at_mar$age,
size = length(age_at_mar$age),
replace = TRUE)
statistics[i] = sd(samp)
}
summary(statistics)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.573 4.689 4.729 4.726 4.761 4.903
lower = mean(statistics - 1.96 * sd(statistics))
upper = mean(statistics + 1.96 * sd(statistics))
ci_ims = c(lower,upper)
ci_ims
## [1] 4.620330 4.831133
This does not assume normality.
lower = quantile(statistics,.025)
upper = quantile(statistics,.975)
ci_alternative = c(lower,upper)
ci_alternative
## 2.5% 97.5%
## 4.621390 4.834643