The Law of Large Numbers states that as the size of the samples increase the mean of the sample converges to the mean of the population the samples are taken from.
The Central Limit Theorom states that a sample means from almost any type of distribution can be modeled by a normal distribution (aspect 2 from this https://www.youtube.com/watch?v=_YOr_yYPytM ), assuming the size of the sample is not too small. I’ve seen suggestions that a sample size greater than or equal to 30 is sufficient, but the prevailing rule seems to be that CLT is appropriate when n * p >= 10 and n * (1-p) >=30, where n is the sample size and p is the sample proportion (# of successes/# of trials).
They both state that larger samples of a population tend to more accurately represent the larger population. The Central Limit Theorem is about a sample distribution being modeled by a normal distribution contrasted to the Law of Large Numbers which is about the mean of a sample converging to the mean of the population. Because the CLT is about modeling a sample, it will lead to the ability to create Confidence Intervals.
Reference used: https://medium.com/analytics-vidhya/law-of-large-numbers-vs-central-limit-theorem-7819f32c67b2
Exponential Distribution - This is based on time, or rate of events, and models the time between events in a Poisson process, where events are independent and continuous with a constant rate. In R, the density, distribution, and quantile functions are dexp, pexp, and qexp. The rate is represented by \(\lambda\), the mean \(\mu\)=\(\lambda\), the variance \(\sigma^2\)=\(\lambda\), and the standard deviation \(\sigma\)=\(\sqrt{\lambda}\).
Great Reference used: https://r-coder.com/exponential-distribution-r/
library("psych")
set.seed(77)
expDist1 <-rexp(10000,.3) # generated exponential distribution
hist(expDist1) # show histogram, demonstrates an exponential distribution
#create matrix(10000,1) with empty values
# fill in the matrix with 0s
z <- matrix(data = rep(x = 0, times = 10000), nrow = 10000, ncol = 1)
# Generate 10000, Sample 100
for (i in 1:10000){ z[i,] <- mean(sample(x=rexp(100,0.3), size = 100, replace=TRUE))}
#plot histogram of means, looks like a Normal Distribution
hist(z, xlab = "", main = "Histogram of Sample Mean (n = 100)")
Let’s try to sample the 25th percentile
set.seed(77) # use same seed
#create matrix(10000,1) with empty values
# fill in the matrix with 0s
z <- matrix(data = rep(x = 0, times = 10000), nrow = 10000, ncol = 1)
# Generate 10000, Sample 100 using 25th quantile
for (i in 1:10000){ z[i,] <- quantile(sample(x=rexp(100,0.3),size=100,replace=T), probs = c(.25))}
#plot histogram of means
hist(z, xlab = "", main = "Histogram of Sample Mean (n = 100)")
Does the central limit theorem hold as expected? Please elaborate
(at-least 3 points).
I don’t think so, this sample of the 25th percentile looks somewhat
normally distributed, but more skewed to the left, the sample size is
big enough, and it converges closer to the 25th percentile, not the
mean.
You can post a few pictures to substantiate your claim while answering the CLT part above. Make sure there are comments in your code to explain and walk the reader through your logic.