Week 4 Discussion - Central Limit Theorem

Please Google and describe Law of Large NumbersLinks to an external site. in your own words.

The Law of Large Numbers states that as the size of the samples increase the mean of the sample converges to the mean of the population the samples are taken from.

Please explain CLT in your own words. You can, and should read your textbook and/or online references to understand what is CLT, its uses, et ctera. Furthermore, if you find any useful resource, include it in your post so that the rest of the class can have a look at it to. EG - Aspect 3 in the section Overview of four aspects. (1:16) in the YouTube video substantiates my claim more formally that xxx…, Josh Starmer at StatsQuestLinks to an external site…. , JB Statistics Intro to CLTLinks to an external site., significance/uses of CLT,Links to an external site. WikipediaLinks to an external site., …

The Central Limit Theorom states that a sample means from almost any type of distribution can be modeled by a normal distribution (aspect 2 from this https://www.youtube.com/watch?v=_YOr_yYPytM ), assuming the size of the sample is not too small. I’ve seen suggestions that a sample size greater than or equal to 30 is sufficient, but the prevailing rule seems to be that CLT is appropriate when n * p >= 10 and n * (1-p) >=30, where n is the sample size and p is the sample proportion (# of successes/# of trials).

What are the similarities and differences between LLN and CLT? Write a few lines.

They both state that larger samples of a population tend to more accurately represent the larger population. The Central Limit Theorem is about a sample distribution being modeled by a normal distribution contrasted to the Law of Large Numbers which is about the mean of a sample converging to the mean of the population. Because the CLT is about modeling a sample, it will lead to the ability to create Confidence Intervals.

Reference used: https://medium.com/analytics-vidhya/law-of-large-numbers-vs-central-limit-theorem-7819f32c67b2

Pick up any distribution apart from normal, uniform or poisson. You can Wikipedia about the distribution and/or read how to implement the distribution in R (what parameters are required to generate the distribution). Please describe this distribution first in 5 lines.

Exponential Distribution - This is based on time, or rate of events, and models the time between events in a Poisson process, where events are independent and continuous with a constant rate. In R, the density, distribution, and quantile functions are dexp, pexp, and qexp. The rate is represented by \(\lambda\), the mean \(\mu\)=\(\lambda\), the variance \(\sigma^2\)=\(\lambda\), and the standard deviation \(\sigma\)=\(\sqrt{\lambda}\).

Great Reference used: https://r-coder.com/exponential-distribution-r/

1. Then, apply the CLT on the sample mean of this chosen distribution in R (adapt our class R code, or you can find an alternative code on the web too).

library("psych")

set.seed(77)
expDist1 <-rexp(10000,.3) # generated exponential distribution
hist(expDist1) # show histogram, demonstrates an exponential distribution

#create matrix(10000,1) with empty values
# fill in the matrix with 0s
z <- matrix(data = rep(x     = 0, times = 10000), nrow = 10000, ncol = 1)

# Generate 10000, Sample 100
for (i in 1:10000){ z[i,] <- mean(sample(x=rexp(100,0.3), size = 100, replace=TRUE))}

#plot histogram of means, looks like a Normal Distribution
hist(z, xlab = "", main = "Histogram of Sample Mean (n = 100)")

1. Alternatively, apply the CLT on any other sample statistic like say the sample median, sample 25th percentile or even the sample 80th percentile. This may be marginally harder than the last part, but you can try to submit both.

Let’s try to sample the 25th percentile

set.seed(77) # use same seed

#create matrix(10000,1) with empty values
# fill in the matrix with 0s
z <- matrix(data = rep(x = 0, times = 10000), nrow = 10000, ncol = 1)

# Generate 10000, Sample 100 using 25th quantile
for (i in 1:10000){ z[i,] <- quantile(sample(x=rexp(100,0.3),size=100,replace=T), probs = c(.25))}

#plot histogram of means
hist(z, xlab = "", main = "Histogram of Sample Mean (n = 100)")

Does the central limit theorem hold as expected? Please elaborate (at-least 3 points).
I don’t think so, this sample of the 25th percentile looks somewhat normally distributed, but more skewed to the left, the sample size is big enough, and it converges closer to the 25th percentile, not the mean.

You can post a few pictures to substantiate your claim while answering the CLT part above. Make sure there are comments in your code to explain and walk the reader through your logic.

Week 4 Discussion - Central Limit Theorem

Chris Valcourt

2024-02-01