Central Limit Theorem:

The central limit theorem basically states two important things:

Mean of the sample = Mean of the population
Standard deviation of the sample = Standard deviation of population/sqrt(sample size) if the sample size is greater than 30.

In other words, as we increase the sample size, the mean of the sample will be equal to the mean of the population and it will become a normal distribution irrespective of the population’s distribution. The variance of the sample will be shrinked and the sample of the mean will be more concentrated about the mean of the population.

Using this theorem we can convert any kind of distribution to a normal distribution. This is very useful since in reality we do not know the distribution of a population and using this theorem we can make inferences about a population based on a sample. Understanding how the sample mean behaves will help us understand how to use this to create confidence intervals and hypothesis tests.

To prove the Central Limit Theorem:

I’m creating a gamma distribution to run simulations for 5000 times with varying sample sizes.

(Gamma distribution is a continuous distribution skewed towards the right)

Sample size = 1

nSamp <- 1                              # Sample size
nSim  <- 5000                           # Number of simulations

# Creating a container to run the for loop
xmean <- rep( 0, nSim )                 # Repitions from 0 to 5000
for(i in 1:nSim)
  {     x1       <- rgamma(nSamp, 5, 1) # Drawing from the population with a mean of 5
        xmean[i] <- mean(x1)
  }
# Creating a histogram for gamma distribution
hist(xmean, xlim = c(0,15),             # Limits for the x-axis      
     col  = 'lightblue',                
     xlab = 'Mean',
     main = 'Histogram of Gamma distribution')

Sample size = 5

nSamp <- 5                              # Sample size
nSim  <- 5000                           # Number of simulations

# Creating a container to run the for loop
xmean <- rep( 0, nSim )                 # Repitions from 0 to 5000
for(i in 1:nSim)
  {     x1       <- rgamma(nSamp, 5, 1) # Drawing from the population with a mean of 5
        xmean[i] <- mean(x1)
  }
# Creating a histogram for gamma distribution
hist(xmean, xlim = c(0,15),             # Limits for the x-axis      
     col  = 'lightblue',                
     xlab = 'Mean',
     main = 'Histogram of Gamma distribution')

Sample size = 20

nSamp <- 20                             # Sample size
nSim  <- 5000                           # Number of simulations

# Creating a container to run the for loop
xmean <- rep( 0, nSim )                 # Repitions from 0 to 5000
for(i in 1:nSim)
  {     x1       <- rgamma(nSamp, 5, 1) # Drawing from the population with a mean of 5
        xmean[i] <- mean(x1)
  }
# Creating a histogram for gamma distribution
hist(xmean, xlim = c(0,15),             # Limits for the x-axis      
     col  = 'lightblue',                
     xlab = 'Mean',
     main = 'Histogram of Gamma distribution')

Sample size = 35

nSamp <- 35                             # Sample size
nSim  <- 5000                           # Number of simulations

# Creating a container to run the for loop
xmean <- rep( 0, nSim )                 # Repitions from 0 to 5000
for(i in 1:nSim)
  {     x1       <- rgamma(nSamp, 5, 1) # Drawing from the population with a mean of 5
        xmean[i] <- mean(x1)
  }
# Creating a histogram for gamma distribution
hist(xmean, xlim = c(0,15),             # Limits for the x-axis      
     col  = 'lightblue',                
     xlab = 'Mean',
     main = 'Histogram of Gamma distribution')

Sample size = 100

nSamp <- 100                            # Sample size
nSim  <- 5000                           # Number of simulations

# Creating a container to run the for loop
xmean <- rep( 0, nSim )                 # Repitions from 0 to 5000
for(i in 1:nSim)
  {     x1       <- rgamma(nSamp, 5, 1) # Drawing from the population with a mean of 5
        xmean[i] <- mean(x1)
  }
# Creating a histogram for gamma distribution
hist(xmean, xlim = c(0,15),             # Limits for the x-axis      
     col  = 'lightblue',                
     xlab = 'Mean',
     main = 'Histogram of Gamma distribution')

Conclusion:

I estimated the mean of the sample with varying sample sizes(1, 5, 20, 35, 100) from the population and created histogram for each sample size. We can clearly notice that Central Limit Theorem holds good because with the increase of sample size, the gamma distribution became a normal distribution with the mean of sample being concentrated around 5 (The mean of the population) as expected. Hence, we can convert any distribution to normal irrespective of the population’s distribution which is key for inferential statistics.

Week - 5 Discussion

2023-02-23