Week_9

Law of Large Numbers in your own words :

The Law of Large Number states that when sample size tends to infinity, the sample mean equals to population mean.
Please explain CLT in your own words :

The central limit theorem basically states two important things:

Mean of the sample = Mean of the population
Standard deviation of the sample = Standard deviation of population/sqrt

To put it another way, as the sample size increases, the mean becomes a normal distribution regardless of the population distribution, the sample variance shrinks, and the mean sample concentrates around the mean population distribution. Using this theorem we can convert any kind of distribution to a normal distribution.

What are the similarities and differences between LLN and CLT?

Both are similar but CLT gives more information, basically random sample is large enough for law of large numbers, it’s average will approach the population average;whereas the sample is drawn independently for CLT.
Pick up any distribution :

Binomial Distribution:

The binomial distribution is a model that measures the probability of a particular event occurring within a fixed number of trials. It is a discrete probability distribution that is used for studying the occurrence of a desired outcome. The model determines the number of trials required to achieve the desired outcome.

Formula : P(X= x) = nCx px qn-x

Step 0 :

#Cleaning workspace
rm(list = ls())
gc()

##          used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 521711 27.9    1160410   62   660385 35.3
## Vcells 948918  7.3    8388608   64  1769625 13.6

cat("\f") #clear the console

Step 1 :

cltbinom <- rbinom(1000,
                  1000,
                  0.05  
                  )
cltbinom[1:16]

##  [1] 47 55 61 42 50 43 37 44 54 56 60 43 48 34 44 41

mu = mean(cltbinom) #Check mean of binomial distribution
mu

## [1] 49.824

sigma = sd(cltbinom) #check sd of binomial distribution
sigma

## [1] 6.996712

library("psych")
describe(cltbinom) #summary of statistics

##    vars    n  mean sd median trimmed  mad min max range skew kurtosis   se
## X1    1 1000 49.82  7     50   49.76 7.41  33  73    40 0.14    -0.12 0.22

hist(x    = cltbinom, 
     main = "Histogram of Binomial Distribution (n=1000)",
     xlab = ""
     )

Step 2 : Create 10000 row and 1 column matrix with 0 entries only

z = matrix(data = rep(x     = 0, 
                       times = 1000
                       ), 
            nrow = 1000, 
            ncol = 1)

z[1:16]

##  [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

describe(z)

##    vars    n mean sd median trimmed mad min max range skew kurtosis se
## X1    1 1000    0  0      0       0   0   0   0     0  NaN      NaN  0

Step 3 :

for (i in 1:1000){ 
    z[i,] <- mean(sample( x        = cltbinom, 
                          size     = 100,    
                          replace  = TRUE    
                        )
                  )
  }
z[1:16]

##  [1] 49.23 49.34 50.25 50.43 49.88 49.77 51.21 49.16 49.93 50.93 49.02 48.03
## [13] 49.80 50.14 50.89 48.83

describe(z)

##    vars    n  mean   sd median trimmed mad   min   max range skew kurtosis   se
## X1    1 1000 49.78 0.69  49.79   49.78 0.7 47.69 51.75  4.06    0    -0.28 0.02

hist(z, xlab = "", main = "Histogram of Sample Mean (n = 100)")

Step 4 : 4) a) Expanding column of the null matrix mode

z <- matrix(data = rep(x     = 0, 
                       times = 4000
                       ), 
            nrow = 1000, 
            ncol = 4)

Replace value of null matrix

n <- c(2, 6, 30, 100) 

for (j in 1:4){        
 
for (i in 1:1000){  # indexes the rows of matrix
  
    z[i,j] <- mean(sample( x       = cltbinom,  # compute mean and assign
                           size    = n[j], 
                           replace = TRUE
                         )
                         )
    }
}

Step 5 : Summary Commands

colnames(z) = c("Sample size=2", "Sample size=6", "Sample size=30", "Sample size=100" ) 
summary(z)

##  Sample size=2   Sample size=6   Sample size=30  Sample size=100
##  Min.   :34.00   Min.   :40.50   Min.   :45.37   Min.   :47.38  
##  1st Qu.:46.50   1st Qu.:47.83   1st Qu.:48.93   1st Qu.:49.35  
##  Median :49.50   Median :49.83   Median :49.83   Median :49.84  
##  Mean   :49.75   Mean   :49.77   Mean   :49.85   Mean   :49.85  
##  3rd Qu.:53.00   3rd Qu.:51.71   3rd Qu.:50.80   3rd Qu.:50.34  
##  Max.   :65.50   Max.   :58.50   Max.   :54.23   Max.   :52.20

Graphs

par(mfrow=c(3,2))
length(cltbinom)

## [1] 1000

hist(x    = cltbinom, 
     main = "Histogram of a Binomial Distribution, N=1000",
     xlab = ""
     )

for (k in 1:4){
    hist(x    = z[,k],     
         main = "Histogram of Binomial Distribution", 
         xlim = c(1, 100),
         xlab = paste0("Sample Size ", n[k], " (Column ", k, " from matrix)") 
         )
}

Does central limit theorem hold as expected? Yes, the CLT does hold as expected. As the sampling sizes increased, the sampling mean converged to the population mean. As the observations increased, the sampling distribution mean came closer to the population mean (as shown in the graphs).

b)Caculate 25th percentile and 80th percentile : We can easily calculate percentiles in R using the quantile() function, which uses the following syntax: quantile(x, probs = seq(0, 0, 0))

#Apply the 25th and 80th percentiles with quantile syntax

quantile(cltbinom, probs = c(.25,.80))

## 25% 80% 
##  45  56

#Iam not sure about this

Week_9

Ganesh Kumar

2023-11-03

Binomial Distribution: