Examining Features of an Exponential Distribution

Overview

This report will examine several features of a sample exponential distribution consisting of sample size 40. This sample will be simulated 1,000 times. The report will also show how this distribution has features that are similar to that of a normal distribution.

Simulation

Our primary simulation is coded into R as follows as variable meanexp:

set.seed(100)
meanexp=NULL
for (i in 1:1000) meanexp=c(meanexp,mean(rexp(40,.2)))

Where:

1:1000 represents 1,000 simulations
40 represents sample size n
.2 represents lambda

A basic histogram of the vector of our simulation meanexp produces:

hist(meanexp,xlab="Sample Mean",main="Histogram of meanexp")

Mean Analysis

We see that the histogram appears to be centered around 4 and 6. Now we’ll take a closer look into how the sample mean and theoretical mean of the meanexp distribution compare.

To calculate our sample mean we’ll use the mean() function in R on the meanexp vector we already have. The output will be the sample, or empirical mean of the simulation.

mean(meanexp)

## [1] 4.999702

To find the theoretical or expected mean, let’s use the formula provided in the prompt, which is that E(X)=1/lamda.

E <- 1/.2
E

## [1] 5

Without needing to display the two means on a graph, we can tell that they are almost equal, which is essentially the crux of the Central Limit Theorem. If we were to increase the number of trials to 10,000 or 100,000 etc. E(X) and µ would converge at 5.00. This reinforces the fact that sample mean estimates population mean.

Variance Analysis

Let’s start out by going ahead and plotting a density graph of meanexp using ggplot:

x <- ggplot(data.frame(x = 1 : 1000, y = meanexp), aes(x=meanexp))
x <- x + geom_histogram(fill = "salmon", binwidth=1, aes(y=..density..), colour="black")
x <- x + geom_density(size=2)
x

To understand variance, we’ll need to calculate both the sample variance and the population variance. The prompt tells us that population variance is equal to 1/lamda^2.

popvar <- 1/(0.2)^2
popvar

## [1] 25

To find the sample variance, we’ll use the formula SD(N) = sqrt(n) * SD(n) where n is equal to our sample size, 40, and SD(n) is the calculated standard deviation of the sample.

samplevar <- (sqrt(40)*sd(meanexp))^2
samplevar

## [1] 25.72977

Here, our sample variance is about .7 above what we expected it should be using our population variance. Let’s redo the simulation with 100,000 trials instead of 1,000 and see what happens.

testexp=NULL
for (i in 1:100000) testexp=c(testexp,mean(rexp(40,.2)))
samplevar <- (sqrt(40)*sd(testexp))^2
samplevar

## [1] 24.83271

Just as we would have predicted, the test sample variance is both smaller and closer to the expected variance. This reflects the idea that as sample size increases, the distribution becomes more tight around the expected mean, and thus variance will decrease.

Comparing Distribution to Normal Distribution

Here we want to compare how and if the meanexp vector we created displays characteristics of a normal distribution. To do so, we can use the qqplot() function in R to compare what a normal random distribution looks like versus our distribution of exponential means.

normal <- rnorm(1000,mean=5)

Let’s look at the qqplot graphs side by side for meanexp and normal. These graphs display the distribution of the randomly generated data we’ve been working with on the y-axis and the distribution of a standard normal population on the x-axis.

par(mfrow = c(1, 2),mar=c(4,4,4,4))
qqnorm(meanexp,col= 4,main="Exponential Distribution QQ Plot",ylim=c(0,8))
abline(h=5,v=0)
qqline(meanexp,lty=3)
qqnorm(normal, col = 2,main="Random Normal Distribution QQ Plot",ylim=c(0,8))
abline(h=5,v=0)
qqline(normal,lty=3)

The dotted diagonal black lines depict how a normally distributed population should be plotted, and it’s pretty clear that the blue and red points match up to those lines.