This report will demonstrate how exponential numbers in R tend to become normally distributed as the number of simulations increases to 1000. As the graphs will show, the distributions sample means will adhere to the central limit theorem, and will become increasingly equal to the population mean, with a small variance as the number of simulations increase.
The population is the exponential distribution simulated in R using the rexp() function. The sample population is 40, and the rate parameter(lambda) is given as 0.2. From this the means and variances of the population and sample population can be obtained.
The exponential distribution of the 40 samples will be generated for 1000 simulations and compared with the central limit theorem. The codes will generate plots of the sampling distributions of the mean and variance of 1000 simulations and compare it to the distribution of mean and variance of one simulation.
n <- 40
B <- 1000
lambda <- 0.2
mns <- matrix(rexp(n*B, lambda), B)
mn0 <- apply(mns, 1, mean)
var0 <- apply(mns, 1, var)
sd <- apply(mns, 1, sd)
The code below is used to produce 3 plots. The first shows distribution of one sample simulation as compared to the other two plots which are the sample distributions of the mean and variance of 1000 simulations of the sample respectively.
sample_mean <- mean(mn0)
sample_mean
## [1] 5.036528
popn_mean <- 1/lambda
popn_mean
## [1] 5
From the computation above, it can be noted that the mean of the sampling distribution (5.00968) is the same as the mean of the population distribution (5).
sample_variance <- var(mn0)
sample_variance
## [1] 0.6934229
popn_var <- (1/lambda)^2/n
popn_var
## [1] 0.625
The sample variance is consistent with the theoretical variance.
The confidence interval calculated below will help us describe the amount of uncertainty associated with a sample estimate of the population parameter. I will use the 95% confidence interval.
sample_mean + c(-1,1) * qt(.975, n-1) * sqrt(sample_variance)/sqrt(n)
## [1] 4.770212 5.302845
the confidence interval (4.745921 5.273439) shows that the value of the population mean will be between these two points.