Harold Nelson
2024-04-12
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.0 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## Loading required package: airports
## Loading required package: cherryblossom
## Loading required package: usdata
Compute the true mean of cdc$age and store it in true_mean.
## [1] 45.06825
Get a random sample of 500 cases from cdc. call it cdc500. Use a seed, 123.
## 'data.frame': 20000 obs. of 9 variables:
## $ genhlth : Factor w/ 5 levels "excellent","very good",..: 3 3 3 3 2 2 2 2 3 3 ...
## $ exerany : num 0 0 1 1 0 1 1 0 0 1 ...
## $ hlthplan: num 1 1 1 1 1 1 1 1 1 1 ...
## $ smoke100: num 0 1 1 0 0 0 0 0 1 0 ...
## $ height : num 70 64 60 66 61 64 71 67 65 70 ...
## $ weight : int 175 125 105 132 150 114 194 170 150 180 ...
## $ wtdesire: int 175 115 105 124 130 114 185 160 130 170 ...
## $ age : int 77 33 49 42 55 55 31 45 27 44 ...
## $ gender : Factor w/ 2 levels "m","f": 1 2 2 2 2 2 1 1 2 1 ...
Create a bootstrap distribution of the mean of cdc500$age. Use 1,000 reps in the infer package.
Create the upper and lower bounds of this distribution appropriate for a 95% CI.
lower = quantile(bootstrap_distribution$stat,.025)
upper = quantile(bootstrap_distribution$stat,.975)
lower
## 2.5%
## 43.3419
## 97.5%
## 46.37025
## # A tibble: 1 × 2
## lower_ci upper_ci
## <dbl> <dbl>
## 1 43.3 46.4
## 2.5%
## 43.3419
## 97.5%
## 46.37025
##
## One Sample t-test
##
## data: cdc500$age
## t = 58.863, df = 499, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 43.30081 46.29119
## sample estimates:
## mean of x
## 44.796
## 2.5%
## 43.3419
## 97.5%
## 46.37025
Use geom_density() and geom_vline(). Include the true mean value.