Project instructions

In this project we will investigate the exponential distribution in R and compare it with the Central Limit Theorem. The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. Set lambda = 0.2 for all of the simulations. The investigation will be carried out with the averages of 40 exponentials. The number of exponentials simulations will be 1000, so it will generate 40000 random exponential numbers with the lambda parameter of 0.2

library(ggplot2)
set.seed(0317)
exp_dist <- data.frame()
simul <- NULL
for (i in 1:1000){
        simul <- rexp(40, rate = 0.2)
        exp_dist <- rbind(exp_dist, simul)
} 
exp_names <- paste("simul", rep(1:40), sep = ".")
names(exp_dist) <- exp_names
dist_means <- data.frame(exp_mean = rowMeans(exp_dist))

Comparison of sample mean with theorical mean

# Theorical mean
theo_mean <- 1/0.2
# Sample mean
sample_mean <- round(mean(dist_means$exp_mean),3)
# Difference
diff <- abs(theo_mean - sample_mean)

The theorical mean is 5
The sample mean is 4.97
The difference between both is 0.03

As seen, the sample mean and the theorical mean are very similar, there is a small difference of 0.03. A different exercise to investigate if the gap get closer is to increase the number of simulations.

Comparison of sample variance with theorical variance

# Theorical variance
theo_var <- (1/0.2)^2/40
# Sample variance
sample_var <- round(var(dist_means$exp_mean),3)
# Difference
diff_var <- abs(theo_var - sample_var)

The theorical variance is 0.625
The sample variance is 0.642
The difference between both is 0.017

There is a small difference of 0.017between the sample variance and the theorical variance.

Plotting the distributions of means

g <- ggplot(dist_means, aes(x = dist_means))
g <- g + geom_histogram(aes(y = ..density..),fill = "grey", colour = "black")
g <- g + geom_density(size = 2, colour = "black")
g <- g + geom_vline(xintercept = theo_mean, size = 1, col = "blue")
g <- g + geom_vline(xintercept = sample_mean, size = 1, col = "red")
g <- g + labs(title = "Distribution of mean of 40 exponential simulations",
              x = "Distribution of mean", y = "")
g <- g+ annotate("text", x=4, y=.6, label=paste("Sample mean:",sample_mean),
                 family="serif",
             fontface="italic", colour="red", size=5)+
        annotate("text", x=6, y=.6, label=paste("Theorical mean:", theo_mean),
                 family="serif",
                 fontface="italic", colour="blue", size=5)
g
## Don't know how to automatically pick scale for object of type data.frame. Defaulting to continuous.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

As seen in the plot the theorical and the sample mean are very similar, and the distribution of the simulated exponential means looks like a gaussian bell, and very similar to a normal distribution. According to the Central Limit Theorem, the distribution of the means should get closer to the normal distribution

## Don't know how to automatically pick scale for object of type data.frame. Defaulting to continuous.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

As seen in the previous plot, the simulated dsitribution of means tends to a normal distribution by the Central Limit Theorem