Submit a link to rendered simulation study on Rpubs for this problem. Consider Y_1,…,Y_n, an i.i.d. sample from various non-normal distributions:
Population 1: a UNIF(0,8) population.
Population 2: a GAM(α=2,β=2) population.
Population 3: a POI(4) population.
For each population, and for each n∈{5,10,20,40,80,160}, simulate 10,000 sample means. For each distribution, plot the simulated densities with the normal approximations superimposed and a plot of the empirical CDFs versus the normal CDFs. For which distribution does normality kick in “fastest”? Is there a n for which Y ‾ appears “normal” regardless of the population?
Uniform (0,8)
library(purrrfect)
Attaching package: 'purrrfect'
The following objects are masked from 'package:base':
replicate, tabulate
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
N <-10000mu <-4sigma <-sqrt(64/12)clt_sims_unif <- (parameters(~n, c(5, 10, 20, 40, 80, 160)) %>%add_trials(N) %>%mutate( Y_sample =map(n, \(n) runif(n, min =0, max =8)) ) %>%mutate( Ybar =map_dbl(Y_sample, mean)) %>%mutate(fU =dnorm(Ybar, mean = mu, sd = sigma /sqrt(n)),FU =pnorm(Ybar, mean = mu, sd = sigma /sqrt(n)),Fhat =cume_dist(Ybar),.by = n ))
ggplot(data = clt_sims_pois, aes(x = Ybar) ) +geom_step(aes(y = FU, col ='Analytic Normal CDF')) +geom_step(aes(y = Fhat, col ='Empirical CDF')) +facet_grid(n~., labeller = label_both, scales ='free') +labs(color='', x =expression(bar(Y)),y ='CDF')+theme_classic()
The plot that appears to have normality kick in the quickest I would say would be the uniform. The CDF overlaid plot appears seamless and the density fit with the normal approximation normalizes the quickest compared to the other plots. A value where all of the graphs appear to reach normality would be n = 40.