Exponential Distribution - Simulated vs Theoretical

Overview

This analysis will present the results of simulating sampling of an Exponential Distribution, and comparing it to the Theoretical Population Distribution. This comparison is a demonstration of the Central Limit Theorem, and prove that with a large sample size the sample population means and variance tend to the Exponential Population mean and variance.

Theoretical Population Parameters

In order to perform the comparison between the sample and Theoretical Population parameters, we will need to build the Theoretical Population distributions from the given parameter data. For the Exponential Distribution with a given lambda, both the population mean and the population standard deviation are calculated as 1/lambda. The population variance then is (1/lambda)^2. For this analysis, we are using a lambda = 0.2.

# Calculate Population Distribution Parameters
lambda <- .2
dist_mean <- 1/lambda
dist_sd <- 1/lambda
dist_var <- dist_sd^2

Simulation to Generate Sample Dataset

To develop a sample dataset for comparison to the Theoretical Population distribution, we generate 1,000 sets of sample size (N=40) taken from an Exponential distribution. Taking the mean of each of those 1,000 sets of 40 samples generates a dataset of 1,000 means. From that dataset of 1,000 means we can calculate the overall mean and variance of the sample mean parameter.

# Simulation - Generate 1,000 sample sets of size 40
sample_n <- 40
sim_num <- 1000
set.seed(10)
sample_means <- NULL
for (i in 1 : sim_num) 
    sample_means <- c(sample_means, mean(rexp(n=sample_n, rate=lambda)))

With the sample dataset compiled, we can now take a quick look at it to get an idea of its range and distribution.

##Historgram of data
hist(sample_means, breaks=25, main="Histogram - Sample Means")

Sample Distribution Paramters

To complete the processing of the sample dataset, we now calculate the mean, standard deviation, and varience of those 1,000 datapoints (means).

# Calculate Overall Sample Mean, SD, and Var
sample_mean <- mean(sample_means)
sample_sd <- sd(sample_means)
sample_var <- sample_sd^2

Sample Mean vs Theoretical Population Mean

Comparing the resulting sample mean vs the Theoretical Population mean, we see they are very close. This is because of the properties of the Central Limit Theorem, which specifies that the larger the number of simulations we ran, we would be converging to on the Theoretical mean.

diff_mean <- dist_mean - sample_mean
dist_mean

## [1] 5

sample_mean

## [1] 5.04506

diff_mean

## [1] -0.04505959

To display these results graphically, the figure below shows the Theoretical mean, overlaid by the sample distribution Histogram and its mean.

#View Sample Means
hist(sample_means, breaks=25, main="Sample Mean Dist. vs Population Dist.")
abline(v=dist_mean, col="darkgreen", lwd=2)
abline(v=sample_mean, col="darkred", lwd=2)
text(4,100,paste("Pop. Mean = 5"), col="darkgreen")
text(6.5,100,paste("Sample Mean = ", round(sample_mean,2)), col="darkred")

Test the difference of the means

We can also obtain a p-value for the H_0: sample_mean = theoritical mean.

pnorm(diff_mean)

## [1] 0.4820299

Since our p_value is much larger than say, alpha=.05, we would not reject the null hypotheses.

Sample Variance vs Theoretical Population Variance

In order to compare the sample variance with the Theoretical variance, we need to adjust the Theoretical variance for the sample size. We do this by dividing the Theoretical Variance by the sample size (40).

# Calculate Distribution Variance for Sample Size
dist_var2 <- dist_var / sample_n

Comparing the Sample vs the Theoretical variance, we see these are also very close. This is again because of the Central Limit Theorem and the fact that the Sample Variance is an unbiased estimator of the Theoretical Variance.

# Distribution vs Sample Comparison
diff_var <- dist_var2 - sample_var
dist_var2

## [1] 0.625

sample_var

## [1] 0.6372544

diff_var

## [1] -0.01225439

Sample Means are Normally Distributed

The Central Limit Theorem tells us that the distribution of the set of sample means should be Normal with mean = Theoretical mean and variance = 1. There are statistical test we could apply here, however we will suffice to demonstrate this Graphically. The figure below shows a probability histogram of the sample dataset, with the dataset Density plot, as well as the Normal Distribution plot. These appear to agree very well, which would confirm the correctness of the Central Limit Theorem.

# Plot Normal Dist with mean histogram
hist(sample_means, prob=TRUE, xlim=c(2,8), breaks=25, main="Sample Mean Dist. vs Normal Dist.")
lines(density(sample_means), col="red", lwd=2)
curve(dnorm(x, mean=5, sd=1), col="darkblue", lwd=2, add=TRUE, yaxt="n")