Exponential distribution is probability distribution which describes time between events in Poisson process. In this document, we will simulate and examine distribution properties of means of exponential distrobution.
In simulation, we will calculate the means of 40 random values from exponential distribution with rate parameter lambda equal to 0.2. We will do 100, 1000, 10000 and 100000 simulations and compare their distribtuions.
set.seed(100)
means1 <- c()
for(i in 1:100){
m <- mean(rexp(40, 0.2))
means1 <- append(means1,m);
}
means2 <- c()
for(i in 1:1000){
m <- mean(rexp(40, 0.2))
means2 <- append(means2,m);
}
means3 <- c()
for(i in 1:10000){
m <- mean(rexp(40, 0.2))
means3 <- append(means3,m);
}
means4 <- c()
for(i in 1:100000){
m <- mean(rexp(40, 0.2))
means4 <- append(means4,m);
}
# plot distributions
par(mfrow=c(2,2))
hist(means1)
hist(means2)
hist(means3)
hist(means4)
As we can see from the plots, as we increase the number of simulations, the distributions gets closer to the standard normal distribution
Theoretical center of distribution for exponential distrobutions is 1/lambda, which is 5 for lambda=0.2. In order to find the center of distribution for each simulation, we need to find a mean for each simulation,
mean(means1)
## [1] 4.995
mean(means2)
## [1] 4.994
mean(means3)
## [1] 4.995
mean(means4)
## [1] 5.001
As we can see from R output, each mean value is approximately 5, which corresponds to theoretical center of distribution. Also, we can notice that calculated value gets closer to theoretical as we increase number of simulations.
To calculate variance and standard deviation of our simulations, we will use var and sd functions from R. We will use only last simulation, because it contains the highest number of simulations (100000).
var(means4)
## [1] 0.6218
sd(means4)
## [1] 0.7886
Theoretical stnadard deviation of exponential distribution is 1/lambda, while our calculated SD is 0.79
Looking at the plot of distributions, we can visually conlcude that it gets closer to normal distribution as the number of simulation increases. In order to prove that distribution is normal mathematically, we will use the following properties of the normal distribution:
Again, we will use only the last simulation with highest number of tries:
center <- mean(means4) # center of distribution
sdev <- sd(means4) #standard deviation
# select all values between center and +/- 1 SD
x <- means4[means4 >= (center - sdev) & means4 <= (center + sdev)]
# select all values between center and +/- 2 SD
y <- means4[means4 >= (center - 2 * sdev) & means4 <= (center + 2 * sdev)]
# select all values between center and +/- 3 SD
z <- means4[means4 >= (center - 3 * sdev) & means4 <= (center + 3 * sdev)]
# percent of values in specified intervals
round(length(x)/length(means4),2)
## [1] 0.68
round(length(y)/length(means4),2)
## [1] 0.96
round(length(z)/length(means4),2)
## [1] 1
As we can see from the output, these values correspond to expected ranges for normal distribution.
In order to calculate the T confidence interval, we will use the following R code:
error <- qt(0.975,df=length(means4)-1)*sdev/sqrt(length(means4))
lower <- center - error
upper <- center + error
lower
## [1] 4.996
upper
## [1] 5.006
In this calculation, we also used simulation with the highest number of tries. This gives us the T confidence interval (4.996, 5.006).