Assignment 4-central limit theorem

Author

Allison Shrivastava

Describe the Law of Large Numbers (LLN)

The law of large numbers is a probability theory that states that as random sampling is repeated more times, the sample mean gets closer to the population mean. That is, the sample mean converges to the population mean as the number of sample goes to infinity.
Describe the Central Limit Theorem (CLT)

The central limit theorem is a probability theory that states that the sampling distribution of a sample mean approaches a normal distribution as the sample size increases, regardless of the shape of the population distribution.
What are the similarities and differences between the two

Both the LLN and CLT are based on the idea that larger sample sizes provide better information about a population. the LLN states that as the sample size increases, the sample mean approaches the population mean. The CLT on the other hand, states that as the sample size becomes large, the sampling distribution of the sample mean becomes approximately normal and describes how the sample means are distributed around the population mean
Distribution chosen and 5 line description

I’ve chosen Chi-squared distribution- which is the sum of the squares of independent standard normal variables. It describes categorical data to determine if the observed results match expected outcomes. The shape is determined by the degrees of freedom ( the number of independent variables being summed) and becomes more symmetric as the degrees of freedom increases.

A. Apply the CLT on the sample mean of the chi-squared distribution

 # set up degrees of freedom, sample size, and number of simulated sample means

set.seed(seed=33)
k<-55
n<-40
b<-10000

#get the sample means
sample_means<-replicate(b,mean(rchisq(n, df=k)))

#standardize with clt
z_mean<-(sample_means-k)/sqrt(2*k/n)

# checks
mean(z_mean)

[1] -0.003001

sd(z_mean)

[1] 0.9995

# vizualise and apply the standard normal
hist(z_mean, probability=TRUE, breaks=35,
     main="Normal approximation of Chi-squared",
     col="purple")

B. Apply the CLT on another sample statistic (median percentile, etc)

# sample medians using the same set up
sample_medians<-replicate(b, median(rchisq(n, df=k)))

# check
mean(sample_medians)

[1] 54.32

sd(sample_medians)

[1] 2.019

#plot
hist(sample_medians,
     probability=TRUE,
     breaks=35,
     col="purple",
     main="Sample median of chi-squared"
     )

C. Does the central limit theorem hold? Include at least 3 points

Yes, the theorem holds here because the samples are independent random variables, the histogram of the standardize sample mean is approximately bell-shaped and the mean of these standardized values is close to 0 while the standard deviation is close to 1 (a tenet of the theorem)