Sample Distribution of Exponential Numbers in R.

Overview

This report will demonstrate how exponential numbers in R tend to become normally distributed as the number of simulations increases to 1000. As the graphs will show, the distributions sample means will adhere to the central limit theorem, and will become increasingly equal to the population mean, with a small variance as the number of simulations increase.

Simulations

The population is the exponential distribution simulated in R using the rexp() function. The sample population is 40, and the rate parameter(lambda) is given as 0.2. From this the means and variances of the population and sample population can be obtained.

The exponential distribution of the 40 samples will be generated for 1000 simulations and compared with the central limit theorem. The codes will generate plots of the sampling distributions of the mean and variance of 1000 simulations and compare it to the distribution of mean and variance of one simulation.

n <- 40
B <- 1000
lambda <- 0.2
mns <- matrix(rexp(n*B, lambda), B)
mn0 <- apply(mns, 1, mean)
var0 <- apply(mns, 1, var)
sd <- apply(mns, 1, sd)

Plots

The code below is used to produce 3 plots. The first shows distribution of one sample simulation as compared to the other two plots which are the sample distributions of the mean and variance of 1000 simulations of the sample respectively.

Sample Mean vs Theoretical Mean

sample_mean <- mean(mn0)
sample_mean

## [1] 5.036528

popn_mean <- 1/lambda
popn_mean

## [1] 5

From the computation above, it can be noted that the mean of the sampling distribution (5.00968) is the same as the mean of the population distribution (5).

Variability of the sample distribution

sample_variance <- var(mn0)
sample_variance

## [1] 0.6934229

popn_var <- (1/lambda)^2/n
popn_var

## [1] 0.625

The sample variance is consistent with the theoretical variance.

Confidence interval

The confidence interval calculated below will help us describe the amount of uncertainty associated with a sample estimate of the population parameter. I will use the 95% confidence interval.

sample_mean + c(-1,1) * qt(.975, n-1) * sqrt(sample_variance)/sqrt(n)

## [1] 4.770212 5.302845

the confidence interval (4.745921 5.273439) shows that the value of the population mean will be between these two points.