Project summary

In this project we will investigate the exponential distribution in R and compare it with the Central Limit Theorem (CLT).

Project description

The exponential distribution can be simulated in R with rexp(n, λ) where, \(\lambda\) is the rate parameter. The mean and standard deviation of the exponential distribution is 1/\(\lambda\). For this project, we will simulate \(40\) exponential values with \(\lambda=0.2\).

Comparison of sample and theoretical mean

We know that, the theoretical mean, \(\mu\), for exponential distribution is \(1/\lambda\) or \(1/0.2=5.0\). From the Law of Large Numbers, we also know that, the sample mean of the population, \(\bar X\) will converge to \(\mu\) for sufficiently large sample size.

Therefore, to establish this comparison, we will simulate the following:

  1. Take a random sample of \(40\) exponential values, with rate, \(\lambda=0.2\).
  2. Calculate the mean, \(\bar X\) of this sample.
  3. Repeat steps 1 and 2 for \(1000\) times.
  4. Plot a histogram for all the mean values as calculated above.

We will see that, the maximum number of times a mean value occurs will be close to the theoretical mean.

Plot of random exponential means

For this project, we will plot a histogram of all the calculated means and overlay a straight line parallel to y-axis passing through \(x=5.0\). We will see that, the peaks of the histogram will be close to this straight line.

As we can see, the thick, straight vertical line passes through \(5.0\) and the peaks of the histogram is are very close to this straight line.

Comparison of sample and theoretical variance

We know that, the theoretical variance, \(\sigma ^2\), for exponential distribution is \(1/\lambda ^2\) or \(1/(0.2^2)=25.0\).

Therefore, to establish this comparison, we will simulate the following:

  1. Take a random sample of \(40\) exponential values, with rate, \(\lambda \) set to \(0.2\).
  2. Calculate the variance, \(Var(X)\) of this sample.
  3. Divide this variance with the theoretical variance, \(1/\lambda ^2\) or \(1/(0.2^2)=25.0\)
  4. Repeat steps 1 and 2 for \(1000\) times.
  5. Plot a histogram for the ratio calculated in step 3 above.

We will see that, the sample variance will be close to the theoretical variance.

Plot of random exponential variance

For this project, we will plot the ratio of sample variance and theoretical variance. The sample variance will be calculated as \(rexp(k,λ)\), where \(k\) is the sample size or \(40\) in this case and \(\lambda =0.2\). The theoretical variance is \(\sigma ^2\).

To understand the plot better, let us do a dry run first with only one simulation, but, a large sample size.

# Set rate of exponential distribution, λ as 0.2
λ <- 0.2
σ <- 1/λ
# Set number of exponentials
k <- 40000
#
expSampleVector<-rexp(k,λ)
expSampleVariance<-var(expSampleVector)
# σ is population standard deviation; so, σ^2 is variance
expTheortVariance<-σ^2
expSampleVariance/expTheortVariance
## [1] 1.006607

The following plot will essentially repeat the code shown above with smaller size and large number of simulations.

As we can see, the peak of the histogram is around \(1.0\) i.e. the sample variance is close to the the theoretical variance.

Distribution of the exponentials

In the real world, exponential distributions come up when we look at a series of events and measure the times between events, which are called interarrival times. If the events are equally likely to occur at any time, the distribution of interarrival times tends to look like an exponential distribution.

The Cumulative Distribution Function (CDF) or, distribution function is given by \[y=1-exp(-$\lambda x)\]

(Source: http://greenteapress.com/thinkstats/html/thinkstats005.html)

Exponential distribution

Since this project uses \(\lambda = 0.2\), a sample size of \(40\) and number of simulations as \(1000\), let us plot the CDF for \(40000\) values.

This plot appears as a straight line here because of the large number of values. For a more realistic example, consider the plot of inter-arrival times of birth here - inter-arrival birth times.

Distribution of averages of exponential distribution

The Central Limit Theorem says that

The distribution of averages of IID variables becomes that of a standard normal as the size increases.

So, if we were to take the means of the variables drawn randomly from an exponential distribution and repeat this over and over again, the distribution should resemble a standard distribution. Specifically, if we

  1. Took the mean, \(\bar X\) and subtracted the population mean, \(\mu \), in this case, \(1/\lambda \)
  2. Divided by standard deviation, \(\sigma \), in this case also, \(1/\lambda \)
  3. Multiplied by square root of sample size, \(sqrt(k)\)

then, our distribution will converge to that of a standard normal distribution.

Formally, \[\frac{\bar X_n - \mu}{\sigma / \sqrt{n}}= \frac{\sqrt n (\bar X_n - \mu)}{\sigma} = \frac{\mbox{Estimate} - \mbox{Mean of estimate}}{\mbox{Std. Err. of estimate}}\] has a distribution like that of a standard normal for large \(n\).

As seen in the plot, the blue line displays the standard normal curve and the histogram shows how the values converge around the peak of the standard normal curve.

To show the convergence better, here is a plot for even larger \(n\), where \(n=k(500 exponentials)*i(8000 simulations)\).

Better !