Purpose: We build a simple loop to demonstrate how the Central Limit Theorem works.

The dataset and the markdown (.Rmd) files can be found here

Load Data

The data comes from Econ 210 Spring 2021, where students drew with replacement from a deck of cards (1-10). The sample size is n=5, so students drew a card, recorded the number, replaced the card, and then drew another card until they got to 5 cards. The students that calculated the mean from the 5 cards. The data is a result of this exercise.

Distribution

We can look at the distribution of these sample means using the data. Notice that it doesn’t look like the uniform distribution (which is what distribution that the deck of cards comes from). Notice also that the tails are thin, and there’s a tendancy to bunch around the center.

deck %>% ggplot(aes(x = Mean)) + geom_dotplot(fill = "blue", color = "orange",stackratio=1) + labs(title = "Distribution of the Mean of 5 draws", x = "Mean") + scale_y_continuous(NULL, breaks = NULL)
## `stat_bindot()` using `bins = 30`. Pick better value with `binwidth`.

Simulation using a loop

What we can do now is simulate a sample size 5 (draws<-5) for 10,000 samples (sim<-10000). We’ll use a simple loop to do this.

sim<-10000
draws<-5

## Create a vector that will capture the results 
result_sim<-rep(NA,sim)

## Draw 5 cards from the deck of cards (1-10 & uniform distribution), find mean, record the mean, repeat 

for(i in 1:sim) {
    #Draw from uniform distribution (Deck of cards)
  draw_sim = runif(draws, min=1, max=10)
    # Get the mean 
  mean_sim<-mean(draw_sim)
    # Record the mean in our vector
  result_sim[i]<-mean_sim
  }

## Convert the vector to a dataframe so we can use ggplot or other functions
result_sim<-data.frame(result_sim)

Now we can use ggplot to look at the distribution of the sample means

result_sim %>% ggplot(aes(x = result_sim)) + geom_histogram(binwidth = .3, color="black", fill="blue")

Compare Sample Distribution to the Population Distribution

## How does the uniform distribution look like? 
result_uniform <- runif(10000,min=1,max=10)
result_uniform<-data.frame(result_uniform)
mean_plot <- result_sim %>% ggplot(aes(x = result_sim)) + geom_histogram(binwidth = .3, color="black", fill="blue")

uniform_plot<-result_uniform %>% ggplot(aes(x = result_uniform)) + geom_histogram(binwidth = .3, color="black", fill="orange")

plot_grid(mean_plot, uniform_plot)