The Central Limit Theorem states that:
- For a population with an unknown mean \(\mu\), taking random samples of the same size n, where n is sufficiently large:
- the sampling distribution of the sample mean (as our statistic of interest) will be approximately normally distributed
- and the mean of the sampling distribution will approximately equal the mean of the population:
- \[\mu_{\overline{x}} = \frac{\mu + ... + \mu}{n} = \mu\]
- \[\sigma^2_{\hat{X}} = \frac{\sigma^2 + ... + \sigma^2}{n^2} = \frac{\sigma^2}{n}\]
Assuming we don’t know the population mean is 50, if we
- take a random sample of 200 observations of our data
- repeat this process 100 times
we can derive a sampling distribution of the mean of the random samples
- This process yeilded a sampling distribution of the mean with 49.81, which is very close to the actual mean of 50
sample_avgs <- replicate(100, mean(sample(data, 200), replace = TRUE))
avg_sample_avg <- mean(sample_avgs)
