For the first week, we have a simple warm-up exercise for the discussion. Using R, generate 100 simulations of 30 samples each from a distribution (other than normal) of your choice. Graph the sampling distribution of means. Graph the sampling distribution of the minimum. Share your graphs and your R code. What did the simulation of the means demonstrate? What about the distribution of the minimum…?
set.seed(1550)
myhypergeoDistribution = rhyper(300,2000,2000,1000)
simulations = 100
means = c()
mins = c()
maxs = c()
for(x in 1:simulations){
particular_sample = sample(myhypergeoDistribution,30)
means = c(means,mean(particular_sample))
mins = c(mins,min(particular_sample))
maxs = c(maxs,max(particular_sample))
}
hyper_data_frame = data.frame(means,mins,max = maxs)
print(c(mean(myhypergeoDistribution),
mean(hyper_data_frame$means)))
## [1] 500.0 499.9
100 Simulations of n = 30 made for a hypergeometric distribution; with a mean of means being 499.9, and actual mean being 500.
summary(hyper_data_frame)
## means mins max
## Min. :494 Min. :469 Min. :514
## 1st Qu.:498 1st Qu.:474 1st Qu.:521
## Median :500 Median :477 Median :527
## Mean :500 Mean :476 Mean :528
## 3rd Qu.:501 3rd Qu.:479 3rd Qu.:539
## Max. :505 Max. :484 Max. :544
hist(hyper_data_frame$means,breaks=30)
hist(hyper_data_frame$mins)
The simulation of the sampled means demonstrates the central limit theorem. Even though theres a right skew, the mean of the samples settles towards the mean of the population.
The sampled minimums show us a very rough figure of where we can expect the distribution of means.