1. Law of Large Numbers

It is a theory that says that as the sample size grows, the mean of the sample will get closer to the average of the whole population.

2. Central Limit Theorem

A theory that says that larger the number of sample size, the more normal distribution the distribution of the sample mean will become.
This theorem enables us to get a normal distribution through taking sample mean, even with those population that are not normally distributed.

3. Difference & Similarity Between LLN and CLT

The Central limit Theorem (CLT) states that when sample size tends to infinity, the sample mean will be normally distributed. The Law of Large Number (LLN) states that when sample size tends to infinity, the sample mean equals to population mean.

4. Distribution Selection

Weibull Distribution

The Weibull distribution is a continuous probability distribution commonly used to model the time-to-failure of various types of products, systems, or components. It is named after Wallodi Weibull, a Swedish engineer who introduced the distribution in the mid-20th century.
Parameters included in weibull distribution
- Threshold parameter (γ): Holding other parameters in this distribution constant, changing the threshold parameter, which can take value in negative numbers or zero, would shift the distrbution on the X-axis. All values must be greater than the threshold. Therefore, negative threshold values let the distribution handle both negative and positive values. When threshold is set to 0, it would contain only positive values.
- Shape parameter (β or k): Holding other parameters constant, changing the shape parameter would change the shape of the distribution.
  - When β < 1: Steadily decreasing values
  - When 1<β<2.6: Right-skewed
  - When β is near 3: Approximates a normal distribution
  - When β >3.7: Left-skewed
  - Scale parameters (η, λ): Holding other parameters constant, changing these two scale parameters can change the stretch of the distribution on the X-axis. As the two parameters increases, the distribution further stretches out to the right, and the height decreases. And decrease the two values does the opposite.

5. Create Weibull Distribution & Central Limit Theorem

## WE will create 2 paramter weibull distrbution which assume the threshold paramters to be "0".

# Parameters setup 
n <- 10000

shape_parameter <- 1.5
scale_parameter <- 7

# Our weibull distribution 
weibull_values <- rweibull(n, shape = shape_parameter, scale = scale_parameter)

# Set distribution as data frame
weibull_random <- data.frame(weibull_values)

# Graph the distribution 
ggplot(weibull_random, aes(x = weibull_values)) +
  geom_density(fill = "orange") +
  labs(title = "Weibull distribution",
       x = "Values",
       y = "Density") +
  theme_classic()

# Calculate basic stats 
mean <- mean(weibull_values)|>
  round(3)
median <- median(weibull_values)|>
  round(3)
standard_d <- sd(weibull_values)|>
  round(3)

print(paste("Mean of this weibull distribution is" , mean))

## [1] "Mean of this weibull distribution is 6.351"

print(paste("Median of this weibull distribution is" , median))

## [1] "Median of this weibull distribution is 5.496"

print(paste("Standard deviation of this weibull distribution is" , standard_d))

## [1] "Standard deviation of this weibull distribution is 4.331"

# Sample size 
sample_size <- c(2, 8, 20, 44, 92, 140)

# Generate a matrix that has 10,000 rows and 6 columns with all entries = 0 
matrixA <- matrix(0, nrow = 10000, ncol = length(sample_size))

matrixA[1:16]

##  [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

# Basic setting
set.seed(seed = 42)

# Applu CLT
for (j in 1:length(sample_size)){       # indexing columns of matrix, where each column will represent different sample size
  for (i in 1:10000){ # indexes the rows of matrix
    matrixA[i,j] <-  mean(sample(x  = weibull_values, 
                            size    = sample_size[j], 
                            replace = TRUE
                            )
                      )
    }
}

# Change column names
colnames(matrixA) <- c("Sample size=2", "Sample size=8", "Sample size=20","Sample size=44", "Sample size=92", "Sample size=140") 
summary(matrixA)

##  Sample size=2     Sample size=8    Sample size=20  Sample size=44 
##  Min.   : 0.1339   Min.   : 1.948   Min.   :3.214   Min.   :3.835  
##  1st Qu.: 4.0782   1st Qu.: 5.267   1st Qu.:5.679   1st Qu.:5.898  
##  Median : 5.9417   Median : 6.245   Median :6.312   Median :6.333  
##  Mean   : 6.3460   Mean   : 6.346   Mean   :6.354   Mean   :6.346  
##  3rd Qu.: 8.2158   3rd Qu.: 7.326   3rd Qu.:6.996   3rd Qu.:6.783  
##  Max.   :24.4986   Max.   :16.077   Max.   :9.844   Max.   :9.221  
##  Sample size=92  Sample size=140
##  Min.   :4.674   Min.   :4.880  
##  1st Qu.:6.035   1st Qu.:6.093  
##  Median :6.338   Median :6.340  
##  Mean   :6.347   Mean   :6.344  
##  3rd Qu.:6.644   3rd Qu.:6.589  
##  Max.   :8.402   Max.   :7.613

par(mfrow = c(3, 3))

# Create a histogram for the original Weibull distribution
hist(x = weibull_values, 
     main = "Histogram of Weibull Distribution", 
     xlim = c(0, max(weibull_values)), 
     ylim = c(0, 400),
     xlab = "Values")

# Create histograms for the means of random samples from matrixA
for (k in 1:6) {
  hist(x = matrixA[, k], 
       main = paste0("Histogram of Mean of Weibull Dist (Sample Size = ", sample_size[k], ")"),
       xlim = c(0, max(matrixA)),
       ylim = c(0, 400),
       xlab = paste0("Column ", k, " from matrixA"))
}

In our case, CLT does hold true. As we can see from the charts generated above, as the total number of sample chosen increased, the distribution became more and more to resemble a normal distribution.

DA_DIS#10_CLT

Pin Lyu

2023-11-02