courtney casey 7.1.5

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.1     ✔ tibble    3.3.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

sim_ybar_dice <- function(n){
mean(sample(1:6, size = n, replace = TRUE))
}

n_vals <- c(20, 40, 80, 160)

dice_sims <- tibble(
n = rep(n_vals, each = 10000)
) %>%
mutate(ybar = map_dbl(n, sim_ybar_dice))

head(dice_sims)

# A tibble: 6 × 2
      n  ybar
  <dbl> <dbl>
1    20  3.8 
2    20  3.45
3    20  3.35
4    20  3.65
5    20  3.3 
6    20  3.25

dice_sims

# A tibble: 40,000 × 2
       n  ybar
   <dbl> <dbl>
 1    20  3.8 
 2    20  3.45
 3    20  3.35
 4    20  3.65
 5    20  3.3 
 6    20  3.25
 7    20  4.45
 8    20  3.05
 9    20  3.1 
10    20  3.4 
# ℹ 39,990 more rows

Formal 5a.

ggplot(dice_sims, aes(x = ybar)) +
geom_histogram(bins = 40) +
facet_wrap(~n, scales = "free_y") +
labs(
title = "sampling distribution of ybar for dice rolls",
x = "ybar",
y = "count"
)

As you can see everything is all centered around 3.5 for all n as it should be and variability decreases as n increases and the shape becomes more normal (also i’m so sorry my computer won’t install purrrfect so i’m trying to do without it and will be visiting you sometime next week to figure this out bare with me please)

5b.

dice_summary <- dice_sims %>%
group_by(n) %>%
summarize(
mean_hat = mean(ybar),
var_hat  = var(ybar),
.groups = "drop"
) %>%
mutate(
mean_theory = 3.5,
var_theory  = 35/(12*n)
)

dice_summary

# A tibble: 4 × 5
      n mean_hat var_hat mean_theory var_theory
  <dbl>    <dbl>   <dbl>       <dbl>      <dbl>
1    20     3.50  0.143          3.5     0.146 
2    40     3.50  0.0728         3.5     0.0729
3    80     3.50  0.0356         3.5     0.0365
4   160     3.50  0.0178         3.5     0.0182

we only need one input (n) to simulate each ybar value, so map() is enough; pmap() is for functions that need multiple inputs per row. every mean is almost 3.5 as it should and as the variability is 35/12n doing the math you get n=20: 0.14583, n = 40: 0.073, n=80: 0.036458, n=160: 0.018229 which every variability is almost straight on the dot