A theory that says that larger the number of sample size, the more normal distribution the distribution of the sample mean will become.
This theorem enables us to get a normal distribution through taking sample mean, even with those population that are not normally distributed.
The Weibull distribution is a continuous probability distribution commonly used to model the time-to-failure of various types of products, systems, or components. It is named after Wallodi Weibull, a Swedish engineer who introduced the distribution in the mid-20th century.
Parameters included in weibull distribution
Threshold parameter (γ): Holding other parameters in this distribution constant, changing the threshold parameter, which can take value in negative numbers or zero, would shift the distrbution on the X-axis. All values must be greater than the threshold. Therefore, negative threshold values let the distribution handle both negative and positive values. When threshold is set to 0, it would contain only positive values.
Shape parameter (β or k): Holding other parameters constant, changing the shape parameter would change the shape of the distribution.
When β < 1: Steadily decreasing values
When 1<β<2.6: Right-skewed
When β is near 3: Approximates a normal distribution
When β >3.7: Left-skewed
Scale parameters (η, λ): Holding other parameters constant, changing these two scale parameters can change the stretch of the distribution on the X-axis. As the two parameters increases, the distribution further stretches out to the right, and the height decreases. And decrease the two values does the opposite.
## WE will create 2 paramter weibull distrbution which assume the threshold paramters to be "0".
# Parameters setup
n <- 10000
shape_parameter <- 1.5
scale_parameter <- 7
# Our weibull distribution
weibull_values <- rweibull(n, shape = shape_parameter, scale = scale_parameter)
# Set distribution as data frame
weibull_random <- data.frame(weibull_values)
# Graph the distribution
ggplot(weibull_random, aes(x = weibull_values)) +
geom_density(fill = "orange") +
labs(title = "Weibull distribution",
x = "Values",
y = "Density") +
theme_classic()
# Calculate basic stats
mean <- mean(weibull_values)|>
round(3)
median <- median(weibull_values)|>
round(3)
standard_d <- sd(weibull_values)|>
round(3)
print(paste("Mean of this weibull distribution is" , mean))
## [1] "Mean of this weibull distribution is 6.351"
print(paste("Median of this weibull distribution is" , median))
## [1] "Median of this weibull distribution is 5.496"
print(paste("Standard deviation of this weibull distribution is" , standard_d))
## [1] "Standard deviation of this weibull distribution is 4.331"
# Sample size
sample_size <- c(2, 8, 20, 44, 92, 140)
# Generate a matrix that has 10,000 rows and 6 columns with all entries = 0
matrixA <- matrix(0, nrow = 10000, ncol = length(sample_size))
matrixA[1:16]
## [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# Basic setting
set.seed(seed = 42)
# Applu CLT
for (j in 1:length(sample_size)){ # indexing columns of matrix, where each column will represent different sample size
for (i in 1:10000){ # indexes the rows of matrix
matrixA[i,j] <- mean(sample(x = weibull_values,
size = sample_size[j],
replace = TRUE
)
)
}
}
# Change column names
colnames(matrixA) <- c("Sample size=2", "Sample size=8", "Sample size=20","Sample size=44", "Sample size=92", "Sample size=140")
summary(matrixA)
## Sample size=2 Sample size=8 Sample size=20 Sample size=44
## Min. : 0.1339 Min. : 1.948 Min. :3.214 Min. :3.835
## 1st Qu.: 4.0782 1st Qu.: 5.267 1st Qu.:5.679 1st Qu.:5.898
## Median : 5.9417 Median : 6.245 Median :6.312 Median :6.333
## Mean : 6.3460 Mean : 6.346 Mean :6.354 Mean :6.346
## 3rd Qu.: 8.2158 3rd Qu.: 7.326 3rd Qu.:6.996 3rd Qu.:6.783
## Max. :24.4986 Max. :16.077 Max. :9.844 Max. :9.221
## Sample size=92 Sample size=140
## Min. :4.674 Min. :4.880
## 1st Qu.:6.035 1st Qu.:6.093
## Median :6.338 Median :6.340
## Mean :6.347 Mean :6.344
## 3rd Qu.:6.644 3rd Qu.:6.589
## Max. :8.402 Max. :7.613
par(mfrow = c(3, 3))
# Create a histogram for the original Weibull distribution
hist(x = weibull_values,
main = "Histogram of Weibull Distribution",
xlim = c(0, max(weibull_values)),
ylim = c(0, 400),
xlab = "Values")
# Create histograms for the means of random samples from matrixA
for (k in 1:6) {
hist(x = matrixA[, k],
main = paste0("Histogram of Mean of Weibull Dist (Sample Size = ", sample_size[k], ")"),
xlim = c(0, max(matrixA)),
ylim = c(0, 400),
xlab = paste0("Column ", k, " from matrixA"))
}
In our case, CLT does hold true. As we can see from the charts generated above, as the total number of sample chosen increased, the distribution became more and more to resemble a normal distribution.