This project compares the exponential distribution in R and with the Central Limit Theorem. With a value for lambda = 0.2 for all of the simulations, the theoretical mean is calculated. To illustrate the relationship the theoretical values will be compared to empirical values using simulated data: 1. The mean of the sample is compared to the theoretical mean 2. The variance of the sample is compared to the theoretical variance 3. A histogram of the distribution is used to show that the distribution is approximately normal.

Part I: Making Sample Data Set and Calculating Theoretical Mean and Variance 1300 random values are created and lamda and number of samples are set.

seed <- set.seed(13)
randomset <- runif(1300, min = -200, max = 200)
n <- 40
lambda <- 0.2
theory_mean <- 1/lambda
theory_variance <- 1/(lambda^2)
sprintf("The Theoretical mean and variance are:%s and %s", theory_mean, theory_variance)
## [1] "The Theoretical mean and variance are:5 and 25"

Part II: Random Sampling and Sample Mean

In this part a random sample of 40 from the 1300 values are is simulated 1000 times creating a data set of 40,000 values organized in 1000 columns.

randomset_sample <- replicate(1000, c(sample(randomset, n, replace = FALSE)))
randomset_sample_exp <- replicate(1000, c(sample(rexp(randomset_sample, lambda), n, replace = FALSE)))

expdistribution_mean <- mean(rexp(randomset_sample, lambda))
expdistribution_variance <- var(rexp(randomset_sample, lambda))
sprintf("The Empirical mean and variance are:%s and %s", expdistribution_mean, expdistribution_variance)
## [1] "The Empirical mean and variance are:4.97749248690481 and 24.6349950441983"

The differnces between the Theoretical and Empirical mean and variance are aproximately: 0.03 and 0.37, respectively.

The following graph displays how the sample mean value approaches the theoretical mean value as the number of samples taken increases.

cumulativemeans <- as.array(cumsum(randomset_sample_exp)/(1:40000))
samples <- c(1:40000)
library(ggplot2)
cum_mean_plot <- qplot(y = cumulativemeans, x = samples, geom = "point", main = "Cumulative Mean vs. Number of Samples") + geom_hline(yintercept = 5, color = "red") 
cum_mean_plot

Part III: Comparing distribution to a Normal Distribution

In this part of the investigation, the distribution of the the data used for the simulation is compared to a Normal Distribution using histograms.

randomset_sample_exp <- replicate(1000, c(sample(rexp(randomset_sample, lambda), n, replace = FALSE)))
expdistribution_50means <- colMeans(randomset_sample_exp)
exp_mean_plot <- qplot(expdistribution_50means, geom = "histogram", main = "Histogram of 1000 Sample Means") + geom_vline(xintercept = 5, color = "red") + labs(x="Means", y="Density", binwidth=10) 
exp_mean_plot
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

Note that the histogram of the sample means is basically a Normal Distribution Centered aproximately at 5 which is the theoretical mean.