Distributions

For the first week, we have a simple warm-up exercise for the discussion. Using R, generate 100 simulations of 30 samples each from a distribution (other than normal) of your choice. Graph the sampling distribution of means. Graph the sampling distribution of the minimum. Share your graphs and your R code. What did the simulation of the means demonstrate? What about the distribution of the minimum…?

set.seed(1550)
myhypergeoDistribution = rhyper(300,2000,2000,1000)
simulations = 100
means = c()
mins = c()
maxs = c()

for(x in 1:simulations){
  particular_sample = sample(myhypergeoDistribution,30)
  means = c(means,mean(particular_sample))
  mins = c(mins,min(particular_sample))
  maxs = c(maxs,max(particular_sample))
}

hyper_data_frame = data.frame(means,mins,max = maxs)

print(c(mean(myhypergeoDistribution),
mean(hyper_data_frame$means)))

## [1] 500.0 499.9

100 Simulations of n = 30 made for a hypergeometric distribution; with a mean of means being 499.9, and actual mean being 500.

summary(hyper_data_frame)

##      means          mins          max     
##  Min.   :494   Min.   :469   Min.   :514  
##  1st Qu.:498   1st Qu.:474   1st Qu.:521  
##  Median :500   Median :477   Median :527  
##  Mean   :500   Mean   :476   Mean   :528  
##  3rd Qu.:501   3rd Qu.:479   3rd Qu.:539  
##  Max.   :505   Max.   :484   Max.   :544

hist(hyper_data_frame$means,breaks=30)

hist(hyper_data_frame$mins)

The simulation of the sampled means demonstrates the central limit theorem. Even though theres a right skew, the mean of the samples settles towards the mean of the population.

The sampled minimums show us a very rough figure of where we can expect the distribution of means.

Distributions

Michael Muller

September 3rd, 2017