Overview
This report will examine several features of a sample exponential distribution consisting of sample size 40. This sample will be simulated 1,000 times. The report will also show how this distribution has features that are similar to that of a normal distribution.
Simulation
Our primary simulation is coded into R as follows as variable meanexp:
set.seed(100)
meanexp=NULL
for (i in 1:1000) meanexp=c(meanexp,mean(rexp(40,.2)))
Where:
- 1:1000 represents 1,000 simulations
- 40 represents sample size n
- .2 represents lambda
A basic histogram of the vector of our simulation meanexp produces:
hist(meanexp,xlab="Sample Mean",main="Histogram of meanexp")
Mean Analysis
We see that the histogram appears to be centered around 4 and 6. Now we’ll take a closer look into how the sample mean and theoretical mean of the meanexp distribution compare.
To calculate our sample mean we’ll use the mean() function in R on the meanexp vector we already have. The output will be the sample, or empirical mean of the simulation.
mean(meanexp)
## [1] 4.999702
Without needing to display the two means on a graph, we can tell that they are almost equal, which is essentially the crux of the Central Limit Theorem. If we were to increase the number of trials to 10,000 or 100,000 etc. E(X) and µ would converge at 5.00. This reinforces the fact that sample mean estimates population mean.
Variance Analysis
Let’s start out by going ahead and plotting a density graph of meanexp using ggplot:
x <- ggplot(data.frame(x = 1 : 1000, y = meanexp), aes(x=meanexp))
x <- x + geom_histogram(fill = "salmon", binwidth=1, aes(y=..density..), colour="black")
x <- x + geom_density(size=2)
x
To understand variance, we’ll need to calculate both the sample variance and the population variance. The prompt tells us that population variance is equal to 1/lamda^2.
popvar <- 1/(0.2)^2
popvar
## [1] 25
Here, our sample variance is about .7 above what we expected it should be using our population variance. Let’s redo the simulation with 100,000 trials instead of 1,000 and see what happens.
testexp=NULL
for (i in 1:100000) testexp=c(testexp,mean(rexp(40,.2)))
samplevar <- (sqrt(40)*sd(testexp))^2
samplevar
## [1] 24.83271
Just as we would have predicted, the test sample variance is both smaller and closer to the expected variance. This reflects the idea that as sample size increases, the distribution becomes more tight around the expected mean, and thus variance will decrease.
Comparing Distribution to Normal Distribution
Here we want to compare how and if the meanexp vector we created displays characteristics of a normal distribution. To do so, we can use the qqplot() function in R to compare what a normal random distribution looks like versus our distribution of exponential means.
normal <- rnorm(1000,mean=5)
Let’s look at the qqplot graphs side by side for meanexp and normal. These graphs display the distribution of the randomly generated data we’ve been working with on the y-axis and the distribution of a standard normal population on the x-axis.
par(mfrow = c(1, 2),mar=c(4,4,4,4))
qqnorm(meanexp,col= 4,main="Exponential Distribution QQ Plot",ylim=c(0,8))
abline(h=5,v=0)
qqline(meanexp,lty=3)
qqnorm(normal, col = 2,main="Random Normal Distribution QQ Plot",ylim=c(0,8))
abline(h=5,v=0)
qqline(normal,lty=3)
The dotted diagonal black lines depict how a normally distributed population should be plotted, and it’s pretty clear that the blue and red points match up to those lines.