Part 1: Law of Large Numbers

  1. The Law of Large Numbers is a probability Theorem that describes the long term behavior of a probability experiment. The law states that if an independent experiment is repeated over a long period of time, the mean of the results should converge/approach the expected value of that experiment. There are two versions of the LLN; weak and strong.

    Weak: The Weak Law of Large Numbers states that as sample size increases, the sample mean will converge to the expected value (probability) of the random variable. For example, the experiment is flipping a coin with heads = 1 and tails = 0. As the number of observations increases, the sample mean will converge closer and closer to the expected value of 0.5 (50% probability of heads).

    Strong: The Strong Law of Large Numbers states that as the sample size increases, the sample mean converges (almost surely) to the expected value. In other words, with enough trials the expected value and random variable will almost surely be equal. In our example, with enough trials of flipping a coin, the random variable will almost surely take on a value of .5

  2. Central Limit Theorem:

    Central Limit Theorem states that the sampling distribution of the sample means will be approximately normally distributed and will become more normally distributed the more samples that take place (sample with replacement). This remains true no matter the distribution in the parent data (original data does not have to be normally distributed). This holds true for any statistical parameter, not just the mean (ex. 80th percentile, median, max, min, etc.). This theorem is extremely powerful in statistics. It can be used to get a great estimation of the population mean (or other parameters) from a much smaller sample size and also give standard errors of parameters that can be used to create confidence intervals. This is very applicable in the real world as it is almost impossible to obtain census data/sample the whole population. Instead, multiple samples can be taken with replacement from a larger sample and CLT can be applied to get a good estimation of the population (as long as parameters are met such as a sufficient sample size). As with most statistical concepts, the larger the sample size, and the more samples, the better the estimation is. With the CLT, with enough samples, the means of the samples will always approach a normal distribution.

  3. The Law of Large Numbers and Central Limit Theorem share some similarities. Both of these laws use a sample to obtain good estimates of true population parameters. In both cases, the larger the sample size, the better the estimation of true parameters will be.

    But there are also fundamental differences:

    The law of large numbers is used to estimate the expected value of a probability experiment! This is different from the Central Limit Theorem which states: the distribution of the sample mean (of the desired parameter) will become more and more normal the greater the sample size/the more samples are taken. The law of large numbers cannot be applied to statistical unknowns from a population if there is no expected value known (i.e. really only useful for probability experiments). The central limit theorem is applicable to both probability experiments and statistical analysis.

    Another important difference is the Law of Large Numbers is used to calculate the expected value of a random variable in a probability experiment. It is only good for the expected value (aka the probability in an experiment… think about the coin example). The CLT can be used for any statistical parameter. This can be the mean or expected value, but can also be median, max, min, skew, etc.).

library(psych)
library(psych)
  1. Setting the seed. Setting the seed allows one to reproduce generating random numbers. It sets a specific random number generator algorithm.

    set.seed(seed = 37)
  2. The Law of Large Numbers is

Part 2: Apply CLT on a distribution

  1. I am choosing the Exponential Distribution. The most common application of this distribution is to model the amount of time waited until an event occur. This can be used for waiting time for customers, natural disasters, car accidents. This is considered a Poisson related distribution and a particular case of the gamma distribution.

    The probability density function is; \(f(x; \lambda) = \lambda e^{-\lambda x}\)

    The cumulative distribution function is \(1 - e^{-\lambda x}\)

The mean of this distribution is \(\mu = 1 / \lambda\)

The variance is calculated as \(\sigma^{2} = 1/ \lambda^{2}\)

Standard Deviation is equal to the mean.

The parameter needed for this distribution is the rate parameter ( \(\lambda\) ). This parameter will determine both the shape and the scale. It is the inverse of the mean; \(\lambda = 1/\mu\). This is similar to the Poisson distribution parameter.

For example, if we were doing an exponential distribution on how long we have to wait to see a home run in the mlb, and we know that on average there is a home run every 100 at bats, the rate parameter would be 1/100.

5A.

#create our data using random generation following exponential distribution
set.seed(37)
mydata <- rexp(n = 10000,
               rate = .25)

head(mydata)
## [1] 0.3971268 9.3633440 1.1903758 3.8385803 4.4062375 8.0647421
#Plot our created data
hist(x = mydata,
     main = "Histogram of the Exponential Distribution (lambda = .25)",
     xlab = "")

#calculate the mean of the created data - true mean
mu <- mean(mydata)
mu
## [1] 4.046159
sigma <- sd(mydata)
sigma
## [1] 4.016235

Step 2: Create an empty matrix of 10000 rows and 1 column where the sample means will be stored.

#create empty matrix of 10000 rows and 1 column where the sample means will be stored
mean_matrix <- matrix(data = rep(x = 0,
                                 times = 10000
                                 ),
                      nrow = 10000,
                      ncol = 1)
head(mean_matrix)
##      [,1]
## [1,]    0
## [2,]    0
## [3,]    0
## [4,]    0
## [5,]    0
## [6,]    0
describe(mean_matrix)
##    vars     n mean sd median trimmed mad min max range skew kurtosis se
## X1    1 10000    0  0      0       0   0   0   0     0  NaN      NaN  0

Step 3: Take a random sample of 100 observations from the distributions and store the means from each sample in the matrix we just created.

#creating the for loop that will take a sample of 100 points (with replacement) from mydata, calculate the mean, and store it in the empty matrix. Do this 10000 times.

for (i in 1:10000){
  mean_matrix[i,] <- mean(sample(x    = mydata,
                                 size = 100,
                                 replace = TRUE))
}
#summary stats for mydata
describe(mean_matrix)
##    vars     n mean  sd median trimmed mad  min  max range skew kurtosis se
## X1    1 10000 4.04 0.4   4.03    4.04 0.4 2.75 5.54  2.79 0.18    -0.07  0

As we can see here, the mean of the means of the samples is 4.05. The true mean of our population is 4.046. These two values are extremely close, showing that the Central Limit Theorem holds true.

#plot the sample means on a histogram

hist(mean_matrix,
     xlab = "",
     ylab = "Histogram of the Sample Means (n = 100)"
)

The histogram of the sample means shows a normal distribution. This means that the sample means follow a normal distribution.

5B.

Now we will use the same strategy to see that the CLT holds true for other statistics. We will use the 25th percentile.

#calculte the true 25th percentile

percentile_25 <- quantile(mydata, .25)
print(percentile_25)
##      25% 
## 1.188489
percentile25_matrix <- matrix(data = rep(x = 0,
                                 times = 10000
                                 ),
                      nrow = 10000,
                      ncol = 1)
head(percentile25_matrix)
##      [,1]
## [1,]    0
## [2,]    0
## [3,]    0
## [4,]    0
## [5,]    0
## [6,]    0
describe(percentile25_matrix)
##    vars     n mean sd median trimmed mad min max range skew kurtosis se
## X1    1 10000    0  0      0       0   0   0   0     0  NaN      NaN  0
#creating the for loop that will take a sample of 100 points (with replacement) from mydata, calculate the mean, and store it in the empty matrix. Do this 10000 times.

for (i in 1:10000){
  percentile25_matrix[i,] <- quantile(sample(x    = mydata,
                                 size = 100,
                                 replace = TRUE), .25)
}
describe(percentile25_matrix)
##    vars     n mean   sd median trimmed  mad  min  max range skew kurtosis se
## X1    1 10000 1.23 0.24   1.21    1.22 0.24 0.54 2.34   1.8 0.32     0.07  0
#plot the sample means on a histogram

hist(percentile25_matrix,
     xlab = "",
     ylab = "Histogram of the Sample Means (n = 100)"
)

The true 25th percentile calculated from the population is 1.188.

The 25th percentile calculated with the CLT (according to the describe function) is 1.21.