Clean everything
rm(list = ls()) # Clear all files from your environment
gc() # Clear unused memory
## used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
## Ncells 526468 28.2 1169480 62.5 NA 669428 35.8
## Vcells 971088 7.5 8388608 64.0 16384 1851821 14.2
cat("\f") # Clear the console
graphics.off() # Clear all graphs
Describe Law of Large Numbers
The law of large numbers describes how if you calculate the average of a greater amount of random samples than the overall mean will converge closer to that of the actual population. Essentially, the greater the sample size, the closer it gets the true population mean.
Please explain CLT in your own words.
The Central Limit theory states how the distribution of the sample means taken from a population will represent a normal distribution even if the true population itself is not a normal distribution. As the sample size increases, the sample means become more normaly distributed.
What are the similarities and differences between LLN and CLT?
Both LLN and CLT utilize samples and focus on the sample statistics as the sample size increases. They differ on what they concentrate, the LLN focuses on the behavior of the sample mean while CLT focuses on the distribution of the sample means.
Pick up any distribution. apart from normal, uniform or poisson. You can Wikipedia about the distribution and/or read how to implement the distribution in R (what parameters are required to generate the distribution).
Exponential distribution: it is a continuous probability distribution that is used to model out time between events in a poisson point in time in which events occur at a constant rate and are independent of each other.
Parameters in the distribution:
Apply the CLT on the sample mean of this chosen distribution. (adapt our class R code, or you can find an alternative code on the web too).
Creating an exponential distribution
lambda <- 0.2
n_samples <- 1000
# Set the Seed
set.seed(55)
# Creating the Exponential Distribution
my_expo <- rexp(n_samples,
rate = lambda)
# Graphing the Exponential Distribution
hist(my_expo,
breaks = 20,
main = "Exponential Distribution",
xlab = "Time Between Events",
ylab = "Frequency",
col = "beige")
# Adding a red vertical line at the Mean
true_mean <- mean(my_expo)
abline(v = true_mean,
col = "red",
lwd = 2)
CLT of the exponential distribution
# creating the empty matrix of 10000 rows, 1 column for storage
sample_means <- matrix(data = rep(x = 0,
times = 10000
),
nrow = 10000,
ncol = 1)
# Use the my_expo data in order to fill in the empty matrix
for (i in 1:10000){
sample_means[i,] <- mean(sample(x = my_expo,
size = 100,
replace = TRUE))}
# Create a histogram of the sample means
hist(sample_means,
breaks = 40,
main = "Histogram of Sample Mean",
xlab = "Sample Mean",
ylab = "Frequency",
col = "lightgreen")
# Adding a red vertical line at the Mean
abline(v = true_mean,
col = "red",
lwd = 2)
Alternatively, apply the CLT on any other sample statistic like say the sample median, sample 25th percentile or even the sample 80th percentile.
I chose the median, shown below:
# creating the empty matrix of 10000 rows, 1 column for storage
sample_median <- matrix(data = rep(x = 0,
times = 10000
),
nrow = 10000,
ncol = 1)
# Use the my_expo data in order to fill in the empty matrix
for (i in 1:10000){
sample_median[i,] <- median(sample(x = my_expo,
size = 100,
replace = TRUE))}
# Adding a histogram of the sample means
hist(sample_median,
breaks = 40,
main = "Histogram of Sample Median",
xlab = "Sample Median",
ylab = "Frequency",
col = "lightblue")
# Adding a red vertical line at the Median
true_median <- median(my_expo)
abline(v = true_median,
col = "red",
lwd = 2)
The CLT does hold, both in the Mean and the Median we see the the distribution take shape of a normal distribution.