Simulation and examination of exponential distribution properties

Exponential distribution is probability distribution which describes time between events in Poisson process. In this document, we will simulate and examine distribution properties of means of exponential distrobution.

Simulation

In simulation, we will calculate the means of 40 random values from exponential distribution with rate parameter lambda equal to 0.2. We will do 100, 1000, 10000 and 100000 simulations and compare their distribtuions.

set.seed(100)

means1 <- c()
for(i in 1:100){
  m <- mean(rexp(40, 0.2))
  means1 <- append(means1,m);
}

means2 <- c()
for(i in 1:1000){
  m <- mean(rexp(40, 0.2))
  means2 <- append(means2,m);
}

means3 <- c()
for(i in 1:10000){
  m <- mean(rexp(40, 0.2))
  means3 <- append(means3,m);
}

means4 <- c()
for(i in 1:100000){
  m <- mean(rexp(40, 0.2))
  means4 <- append(means4,m);
}

# plot distributions
par(mfrow=c(2,2))
hist(means1)
hist(means2)
hist(means3)
hist(means4)

plot of chunk unnamed-chunk-1

As we can see from the plots, as we increase the number of simulations, the distributions gets closer to the standard normal distribution

Center of distribution

Theoretical center of distribution for exponential distrobutions is 1/lambda, which is 5 for lambda=0.2. In order to find the center of distribution for each simulation, we need to find a mean for each simulation,

mean(means1)
## [1] 4.995
mean(means2)
## [1] 4.994
mean(means3)
## [1] 4.995
mean(means4)
## [1] 5.001

As we can see from R output, each mean value is approximately 5, which corresponds to theoretical center of distribution. Also, we can notice that calculated value gets closer to theoretical as we increase number of simulations.

Distribution variance

To calculate variance and standard deviation of our simulations, we will use var and sd functions from R. We will use only last simulation, because it contains the highest number of simulations (100000).

var(means4)
## [1] 0.6218
sd(means4)
## [1] 0.7886

Theoretical stnadard deviation of exponential distribution is 1/lambda, while our calculated SD is 0.79

Normality of distribution

Looking at the plot of distributions, we can visually conlcude that it gets closer to normal distribution as the number of simulation increases. In order to prove that distribution is normal mathematically, we will use the following properties of the normal distribution:

  1. 68% of distribution lies between center and +/- 1 SD
  2. 95% of distribution lies between center and +/- 2SD
  3. 97.5% of distribution lies between center and +/- 3SD

Again, we will use only the last simulation with highest number of tries:

center <- mean(means4) # center of distribution
sdev <- sd(means4) #standard deviation

# select all values between center and +/- 1 SD
x <- means4[means4 >= (center - sdev) & means4 <= (center + sdev)]
# select all values between center and +/- 2 SD
y <- means4[means4 >= (center - 2 * sdev) & means4 <= (center + 2 * sdev)]
# select all values between center and +/- 3 SD
z <- means4[means4 >= (center - 3 * sdev) & means4 <= (center + 3 * sdev)]

# percent of values in specified intervals
round(length(x)/length(means4),2) 
## [1] 0.68
round(length(y)/length(means4),2)
## [1] 0.96
round(length(z)/length(means4),2)
## [1] 1

As we can see from the output, these values correspond to expected ranges for normal distribution.

Confidence intervals

In order to calculate the T confidence interval, we will use the following R code:

error <- qt(0.975,df=length(means4)-1)*sdev/sqrt(length(means4))
lower <- center - error
upper <- center + error
lower
## [1] 4.996
upper
## [1] 5.006

In this calculation, we also used simulation with the highest number of tries. This gives us the T confidence interval (4.996, 5.006).