1. Please explain CLT in your own words.

The central limit theorem is a powerful tool that shows us if you take a large enough sample from a population, the sample being at least larger than 30, we can demonstrate that the sample means will always be normally distributed even if the population is not normally distributed or we don’t even know how the population is distributed. For the CLT to be true, three conditions have to be met. First, the samples are independent and identically distributed; second, the population experiences a finite variance; and lastly, the sample size has to be sufficiently large.

(https://www.youtube.com/watch?v=YAlJCEDH2uY)

2.

Normal distribution example

Using rnomr to draw 1000 numbers. We can create a normal distribution with a mean of 100 and standard deviation of 10

data <- rnorm(1000, mean = 100, sd = 10)
head(data)
## [1] 85.82197 93.48493 85.53977 91.27022 91.22991 92.89055
#Creating a graph

hist(data,prob=TRUE)

mean(data)
## [1] 100.6568

Example 2

data2 <- rnorm(500, mean = 100, sd = 10)
hist(data2, prob = TRUE)

mean(data2)
## [1] 99.78872

Example 2

Normal distribution graph

result4 <- curve(dnorm(x,mean=100,sd=10),70,130,lwd=2,col="pink")

Applaying ClT using the normal distribution from part 1

mu <- 100 
sigma <- 10 
n <- 30

creating a place to store the values

xbar <- rep(0,500)

Creating a loop to store the results inside xbar

for (i in 1:500) { 
  xbar[i]=mean(rnorm(n,mean=mu,sd=sigma)) 
  }
hist(xbar,
     prob   =TRUE,
     breaks =12,
     xlim   =c(70,130),
     ylim   =c(0,0.3)
     )

#checking the mean value 

mean(xbar)
## [1] 100.2131

Applaying CLT with a bigger sample

n2 <- 80
xbar2 <- rep(0,500)
#creating the loop 
for (i in 1:500) { 
  xbar2[i]=mean(rnorm(n,mean=mu,sd=sigma)) 
  }
hist(xbar2,
     prob   =TRUE,
     breaks =12,
     xlim   =c(70,130),
     ylim   =c(0,0.3)
     )

The CLT holds true. As we can see when we select a sample from a nominal distribution, the distribution of the sample mean is also in normal shape. We can see this on the graph expriencing a bell curve. Also the mean of the parent population is the same as the mean of the sample population. In my case they are sliglty diffrent since I am using the rnorm command to create the distribution, but even using rnrom command we can see that the mean of the parent population and sample population are close to each other.