Law of Large Numbers in your own words :
The Law of Large Number states that when sample size tends to infinity, the sample mean equals to population mean.
Please explain CLT in your own words :
The central limit theorem basically states two important things:
Mean of the sample = Mean of the population
Standard deviation of the sample = Standard deviation of population/sqrt
To put it another way, as the sample size increases, the mean becomes a normal distribution regardless of the population distribution, the sample variance shrinks, and the mean sample concentrates around the mean population distribution. Using this theorem we can convert any kind of distribution to a normal distribution.
What are the similarities and differences between LLN and CLT?
Both are similar but CLT gives more information, basically random sample is large enough for law of large numbers, it’s average will approach the population average;whereas the sample is drawn independently for CLT.
Pick up any distribution :
The binomial distribution is a model that measures the probability of a particular event occurring within a fixed number of trials. It is a discrete probability distribution that is used for studying the occurrence of a desired outcome. The model determines the number of trials required to achieve the desired outcome.
Formula : P(X= x) = nCx px qn-x
Step 0 :
#Cleaning workspace
rm(list = ls())
gc()
## used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 521711 27.9 1160410 62 660385 35.3
## Vcells 948918 7.3 8388608 64 1769625 13.6
cat("\f") #clear the console
Step 1 :
cltbinom <- rbinom(1000,
1000,
0.05
)
cltbinom[1:16]
## [1] 47 55 61 42 50 43 37 44 54 56 60 43 48 34 44 41
mu = mean(cltbinom) #Check mean of binomial distribution
mu
## [1] 49.824
sigma = sd(cltbinom) #check sd of binomial distribution
sigma
## [1] 6.996712
library("psych")
describe(cltbinom) #summary of statistics
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 1000 49.82 7 50 49.76 7.41 33 73 40 0.14 -0.12 0.22
hist(x = cltbinom,
main = "Histogram of Binomial Distribution (n=1000)",
xlab = ""
)
Step 2 : Create 10000 row and 1 column matrix with 0 entries only
z = matrix(data = rep(x = 0,
times = 1000
),
nrow = 1000,
ncol = 1)
z[1:16]
## [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
describe(z)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 1000 0 0 0 0 0 0 0 0 NaN NaN 0
Step 3 :
for (i in 1:1000){
z[i,] <- mean(sample( x = cltbinom,
size = 100,
replace = TRUE
)
)
}
z[1:16]
## [1] 49.23 49.34 50.25 50.43 49.88 49.77 51.21 49.16 49.93 50.93 49.02 48.03
## [13] 49.80 50.14 50.89 48.83
describe(z)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 1000 49.78 0.69 49.79 49.78 0.7 47.69 51.75 4.06 0 -0.28 0.02
hist(z, xlab = "", main = "Histogram of Sample Mean (n = 100)")
Step 4 : 4) a) Expanding column of the null matrix mode
z <- matrix(data = rep(x = 0,
times = 4000
),
nrow = 1000,
ncol = 4)
n <- c(2, 6, 30, 100)
for (j in 1:4){
for (i in 1:1000){ # indexes the rows of matrix
z[i,j] <- mean(sample( x = cltbinom, # compute mean and assign
size = n[j],
replace = TRUE
)
)
}
}
Step 5 : Summary Commands
colnames(z) = c("Sample size=2", "Sample size=6", "Sample size=30", "Sample size=100" )
summary(z)
## Sample size=2 Sample size=6 Sample size=30 Sample size=100
## Min. :34.00 Min. :40.50 Min. :45.37 Min. :47.38
## 1st Qu.:46.50 1st Qu.:47.83 1st Qu.:48.93 1st Qu.:49.35
## Median :49.50 Median :49.83 Median :49.83 Median :49.84
## Mean :49.75 Mean :49.77 Mean :49.85 Mean :49.85
## 3rd Qu.:53.00 3rd Qu.:51.71 3rd Qu.:50.80 3rd Qu.:50.34
## Max. :65.50 Max. :58.50 Max. :54.23 Max. :52.20
par(mfrow=c(3,2))
length(cltbinom)
## [1] 1000
hist(x = cltbinom,
main = "Histogram of a Binomial Distribution, N=1000",
xlab = ""
)
for (k in 1:4){
hist(x = z[,k],
main = "Histogram of Binomial Distribution",
xlim = c(1, 100),
xlab = paste0("Sample Size ", n[k], " (Column ", k, " from matrix)")
)
}
Does central limit theorem hold as expected? Yes, the CLT does hold as expected. As the sampling sizes increased, the sampling mean converged to the population mean. As the observations increased, the sampling distribution mean came closer to the population mean (as shown in the graphs).
b)Caculate 25th percentile and 80th percentile : We can easily calculate percentiles in R using the quantile() function, which uses the following syntax: quantile(x, probs = seq(0, 0, 0))
#Apply the 25th and 80th percentiles with quantile syntax
quantile(cltbinom, probs = c(.25,.80))
## 25% 80%
## 45 56
#Iam not sure about this