Overview

In this project the exponential distribution in R is compared to the Central Limit Theorem. The exponential distribution is simulated in R using rexp(n, lambda) where lambda is the rate parameter. The mean of the exponential distribution is 1/lambda and the standard deviation is also 1/lambda. A lambda = 0.2 value is set for all of the simulations. The distribution of averages of 40 exponentials is investigated and a thousand simulations are conducted.

The Central Limit Theorem is investigated using the following guidelines.

Simulations

Uniform Distributions

In order to begin the simulation, a theoretical comparison must be made between 1000 random uniforms and a distribution of 1000 averages of 40 random uniforms.

hist(runif(1000))

As can be seen from the plot above, the values are roughly equally distributed between the default 0.0 and 1.0 values of the function runif

Now, a distribution of averages is conducted below.

mns = NULL
for (i in 1 : 1000) mns = c(mns, mean(runif(40)))
hist(mns)

Project Initialization

We first start by initializing the known values and constants. During the testing phase of this investigation, a seed was set (using the set.seed function) for RStudio’s random number generator based on the known values. This was to make the testing of the project easier. However, to make the simulations reflect true randomness (or at least pseudo-randomness), the seed function was removed (or at least, commented out).

## Loading Libraries
library(ggplot2)

## Known values and constants
lambda <- 0.2
n <- 40
sims <- 1:1000

## Creating blank vectors
means <- vector("numeric")
means_sum <- vector("numeric")
means_cumulative <- vector("numeric")

## Set a seed using known values
##set.seed(n*lambda*max(sims))

Creating Vector of Random Means

Using the rexp function, a vector of 1000 exponentially distributed means is created. Using this vector, another vector named means_sum is filled in using a cumulative sum of the previous means vector. Furthermore, means_cumulative holds the cumulative means of the sums.

## Creating vector of means
for (i in sims)
  means[i] <- mean(rexp(n, lambda))

## Creating vector of sum of means
for (i in sims)
{
  if (i == 1)
    means_sum[i] <- means[i]
  else
    means_sum[i] <- means_sum[i-1] + means[i]
}

## Creating vector of cumulative means
for (i in sims)
  means_cumulative[i] <- means_sum[i] / i

Sample Mean vs Theoretical Mean

The theoretical mean calculated is shown below. The sample mean is calculated using the vector data and is also shown below.

print(sprintf("Theoretical Mean: %f", 1/lambda))
## [1] "Theoretical Mean: 5.000000"
print(sprintf("Sample Mean: %f", means_cumulative[max(sims)]))
## [1] "Sample Mean: 4.945749"

Using the vectors created above, we create a data frame to hold all the data and then create a plot for the sample means.

data <- data.frame(x = sims, y = means_cumulative)

means_plot <- ggplot(data, aes(x=x,y=y))
means_plot <- means_plot + geom_line(size = 2)
means_plot <- means_plot + geom_abline(intercept = 1/lambda, slope = 0, color="red", size = 1)
means_plot <- means_plot + scale_y_continuous(limits=c(min(means_cumulative)-0.01,max(means_cumulative)+0.01))
means_plot <- means_plot + theme(plot.title = element_text(size = 12, face="bold", vjust=2, hjust=0.5))
means_plot <- means_plot + labs(title = "Plot of Sample Mean vs Theoretical Mean")
means_plot <- means_plot + labs(x = "Number of Simulations", y = "Sample Mean (cumulative)")
print(means_plot)

Sample Variance vs Theoretical Variance

Next, we make a comparison between the variances. Note that the variance for this particular simulation is \(\sigma^2 = (1/\lambda)^2\)

print(sprintf("Sample Variance: %f", var(means)*n))
## [1] "Sample Variance: 25.188049"
print(sprintf("Theoretical Variance: %f", (1/lambda)^2))
## [1] "Theoretical Variance: 25.000000"

Comparison of Distribution of Means to Normal Distributions

Lastly, we compare the distribution of the randomly-generated means and see how close it is to the standard normal distribution. We will use the function dnorm.

distribution_plot <- ggplot(data.frame(x=means), aes(x=x))
distribution_plot <- distribution_plot + geom_histogram(position="identity", fill="blue", color="black", alpha=0.2, binwidth=0.5,aes(y=..density..))
distribution_plot <- distribution_plot + stat_function(fun = dnorm, color = "red", args = list(mean=1/lambda))
distribution_plot <- distribution_plot + scale_x_continuous(limits=c(1,9)) + scale_y_continuous(limits=c())
distribution_plot <- distribution_plot + theme(plot.title=element_text(size=12,face="bold",vjust=2,hjust=0.5))
distribution_plot <- distribution_plot + labs(title="Sample vs Theoretical Means Distribution")
distribution_plot <- distribution_plot + labs(x="Sample Mean", y="Frequency")
print(distribution_plot)