Overview
in this file we’re going to explore properties of Exponential Distribution which is shown with the formula \(\lambda e^{-\lambda x}\).
mean and standard deviation of this distribution is \(1/\lambda\)
Simulation Exercise
In this project you will investigate the exponential distribution in R and compare it with the Central Limit Theorem. The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. Set lambda = 0.2 for all of the simulations. You will investigate the distribution of averages of 40 exponential. Note that you will need to do a thousand simulations.
let’s first review the shape of our distribution using ggplot :
library(ggplot2)
ggplot(data.frame(x = c(0,20)),aes(x = x )) + stat_function(fun = dexp,args = c(rate = 0.2)) + labs(y = 'P(x)')now let’s go for simulating :
lambda = 0.2
n = 40
simul_n = 1000Illustrate via simulation and associated explanatory text the properties of the distribution of the mean of 40 exponentials. You should
1 - Show the sample mean and compare it to the theoretical mean of the distribution.
set.seed(1994102)
theoratical_mean <- 1/lambda
sim_arr = c()
sim_var = c()
for (i in 1:1000){
temp = rexp(n = 40,rate = lambda)
sim_arr = c(sim_arr,mean(temp))
sim_var = c(sim_var ,var(temp))
}
library(ggplot2)
ggplot(data.frame(x = sim_arr),aes(x = x)) + geom_histogram(binwidth = 0.2) + geom_vline(data = data.frame(sim_arr),aes(xintercept = mean(sim_arr),color = 'simulation mean'),size =1.2)+ geom_vline(aes(xintercept = theoratical_mean,color = 'theoratical mean'),size = 1.2)+ labs(x = 'mean',y = 'Frequency') as the plot shows the average of simulated numbers are close to theory but how close? :
## [1] 5
## [1] 5.021273
so the difference is about 0.43 percent.
theoratical_var <- (1/lambda)^2
#ggplot(data.frame(x = sim_sd),aes(x = x)) + geom_histogram(binwidth = 0.2) + geom_vline(aes(xintercept = theoratical_sd,color = 'theoratical sd'),size = 1.2)
myVar <- mean(sim_var)
myVar## [1] 25.13219
theoratical_var## [1] 25
mean of sample variances is 25.1321918 and the theory suggests 25. we can say that we’re pretty close.
Law of Large number of samples
let’s see how normal Our simulation data is while the sample is getting larger.
means_vec <- cumsum(sim_arr)/(1:simul_n)
g <- ggplot(data.frame(x = c(1:simul_n),y = means_vec),aes(x = x ,y = y))
g <- g + geom_hline(yintercept = theoratical_mean,color = 'darkred',size = 1.5)
g <- g + geom_line(size = 1.2)
g <- g + ggtitle('means of samples compared to theoratical mean')
g <- g + labs(x = 'number of samples',y = 'mean ')
g in the last step we are going to compare our sample means with normal distribution using CLT and LLN
g <- ggplot(data.frame(x = sim_arr),aes(x))
g <- g + geom_histogram(aes(y = ..density..), colour = 'white',fill = 'salmon' , binwidth = 0.1)
g <- g + stat_function(fun = dnorm , args = list(mean = sim_mean,sd = sd(sim_arr)),colour = 'black',size = 1.5)
g <- g + ggtitle('distribution of sample of 1000 random exponential variables') + labs(x = 'means')
gso in three steps we showed that our samples of random exponential variables are normal. first we calculated the mean of 1000 samples of size 40 with theoretical mean of this distribution. second we compared the variances of samples with theoretical then we plotted a normal bell curve with mean and standard deviation of our samples and compared that to our simulation data density.