Week 9 Discussion - Central Limit Theorem

Haiding Luo

2023 11 6

1. Please Google and describe Law of large numbers in your own words.

The Law of Large Numbers refers to the phenomenon in random experiments where, as the number of trials increases, the frequency of an event approaches a stable value. In other words, when the number of trials is sufficiently large, the probability of an event occurring is almost equal to its frequency.

2. Please explain CLT in your own words.

The Central Limit Theorem (CLT) states that regardless of the distribution of random variables, when these variables are added together, the distribution of their sum tends to become a normal distribution. In other words, even if the individual distributions of each random variable are different, as long as there are enough of them, their sum exhibits characteristics of a normal distribution.

3. What are the similarities and differences between LLN and CLT? Write a few lines.

I believe the commonalities between the two are, firstly, their shared focus on random phenomena, and secondly, their requirement for a sufficiently large sample size. The differences between them are that the Law of Large Numbers does not require random events to have the same distribution, whereas the Central Limit Theorem requires that random variables be independent and identically distributed. The Law of Large Numbers describes a single random event, while the Central Limit Theorem describes the properties of the sum of multiple random events.

4.  Pick up any distribution apart from normal, uniform or poisson.  You can Wikipedia about the distribution and/or read how to implement the distribution in R (what parameters are required to generate the distribution).

Please describe this distribution first in 5 lines.

In RStudio, the binomial distribution is a probability distribution that describes the probability distribution of the number of successes in a series of independent and identically distributed Bernoulli trials. Each trial has only two possible outcomes: success (typically represented as 1) or failure (typically represented as 0), and the probability of success remains constant from trial to trial.

5. A  Then, apply the CLT on the sample mean of this chosen distribution.

rm(list = ls()) 
gc()  
##          used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 517433 27.7    1148188 61.4   660382 35.3
## Vcells 939519  7.2    8388608 64.0  1769617 13.6
cat("\f")
set.seed(40)
mydata <- rbinom(20, size = 20, prob = 0.4)
head(mydata)  
## [1]  9 11  9  5  6  8
library("psych")
describe(mydata)
##    vars  n mean  sd median trimmed  mad min max range skew kurtosis   se
## X1    1 20  7.7 2.3    7.5     7.5 2.22   5  12     7 0.47    -1.05 0.51
hist(x = mydata,
     main = "Histogram of the Binomial Distribution ",
     xlab = "")

mu <- mean(mydata)
mu
## [1] 7.7
sigma <- sd(mydata)    
sigma
## [1] 2.29645
?matrix    
## 打开httpd帮助服务器… 好了
?rep
z <- matrix(data = rep(x     = 0, 
                       times = 10000
                       ), 
            nrow = 10000, 
            ncol = 1)

z[1:16]
##  [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
describe(z)
##    vars     n mean sd median trimmed mad min max range skew kurtosis se
## X1    1 10000    0  0      0       0   0   0   0     0  NaN      NaN  0
for (i in 1:10000){ 
    z[i,] <- mean(sample( x        = mydata,  
                          size     = 100,   
                          replace  = TRUE   
                        )
                )
  }

z[1:16]
##  [1] 7.49 7.58 7.65 8.16 7.63 7.72 8.18 7.56 7.78 7.76 7.77 7.68 7.62 8.00 7.10
## [16] 7.64
describe(z)
##    vars     n mean   sd median trimmed  mad min  max range skew kurtosis se
## X1    1 10000  7.7 0.22    7.7     7.7 0.22 6.8 8.54  1.74 0.07    -0.01  0
hist(z, xlab = "", main = "Histogram of Sample Mean ")

5B.

percent25 <- quantile(mydata, .25)
print(percent25)
## 25% 
##   6
percent25_matrix <- matrix(data = rep(x = 0,
                                 times = 10000
                                 ),
                      nrow = 10000,
                      ncol = 1)

describe(percent25_matrix)
##    vars     n mean sd median trimmed mad min max range skew kurtosis se
## X1    1 10000    0  0      0       0   0   0   0     0  NaN      NaN  0
for (i in 1:10000){
  percent25_matrix[i,] <- mean(sample(x    = mydata,
                                 size = 100,
                                 replace = TRUE), .25)}
describe(percent25_matrix)
##    vars     n mean   sd median trimmed  mad  min  max range skew kurtosis se
## X1    1 10000  7.4 0.28    7.4    7.39 0.27 6.34 8.54   2.2  0.1     0.15  0
hist(percent25_matrix,
     xlab = "",
     ylab = "Histogram of the Sample Means "
)

The 25th percentile for this population is 6

The 25th percentile calculated with CLT is 7.4