The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also also 1/lambda. Set lambda = 0.2 for all of the simulations. In this simulation, you will investigate the distribution of averages of 40 exponential(0.2)s. Note that you will need to do a thousand or so simulated averages of 40 exponentials.
Illustrate via simulation and associated explanatory text the properties of the distribution of the mean of 40 exponential(0.2)s. You should
Show where the distribution is centered at and compare it to the theoretical center of the distribution.
Show how variable it is and compare it to the theoretical variance of the distribution.
Show that the distribution is approximately normal.
Evaluate the coverage of the confidence interval for 1/lambda
The distribution data is simulated by using replicate and storing the means of each simulation in the variable mean_sim
## Setting the seed of simulation
set.seed(1234)
## Setting the values of the constants like lambda =0.2,
## Number of simulation = 1000, and simulation size = 40
lambda<-0.2
N<-1000
sim_size<-40
## performing simulation using a replicate function
## mean_sim stores the value of mean for each suimulation
mean_sim <- replicate(N, mean(rexp(sim_size, lambda)))
## calculating the mean of the sampling distribution
mean_mean_sim <- mean(mean_sim)
mean_mean_sim
## [1] 4.974239
## As known the theoretical mean of the distribution is supposed to be `1/lambda`
theor_mean<-1/lambda
theor_mean
## [1] 5
As observed the mean of sampling distribution somes out to be 4.974239 which is very close to the actual mean of the distribution 5
## Calculation for the sd and variance of the simulated sampling distribution
sd_mean_sim <- sd(mean_sim)
var_mean_sim <-var(mean_sim)
## standard deviation and variance of simulated sampling distribution
c(sd_mean_sim, var_mean_sim)
## [1] 0.7554171 0.5706551
## Calculation for the theoretical sd and variance for sampling data
theor_sd <- (1/lambda)/sqrt(40)
theor_var <- theor_sd^2
## Theoretical Sd and variance
c(theor_sd, theor_var)
## [1] 0.7905694 0.6250000
The simulated and actual standard deviation and variance seems to be very close to each other
Plotting the density distribution for the sampling distribution where each cases represent the mean of sample of 40 observations
## plotting the density distribution
h=hist(mean_sim, freq = FALSE, main = "Histogram for simulation mean", ylim = c(0,0.6), xlab = "Means of each Simulations")
## Overlaying the histogram with the actual density plot for normal distribution ie the expected density plot
xfit<- seq(min(mean_sim),max(mean_sim),length=60)
yfit<- dnorm(xfit, mean = theor_mean, sd = theor_sd)
lines(xfit, yfit, col="blue", lwd=2)
## density of the averages of the sample
lines(density(mean_sim), col = "red", lwd = 2)
## Red color in the graph represent the mean of the sampling distribution
abline(v = mean(mean_sim), col = "red", lwd = 2)
The histogram drawn neatly approximates the normal distribution and is unimodal, and is not skewed
Also plotting the qqplot to verify its normality
qqnorm(mean_sim); qqline(mean_sim, col = 2)
As observerd the pts are very close to the perfect straight line, it can be assured that the sampling distribution follows the normal distribution
Evaluating the conditions for applying the central limit Theorem - Independence The observations are independent, since its a random data - Sample Size and Skew sample size for each sample is >30
Calculating the confidence interval using simulated data
ci_sim <- mean_mean_sim + c(-1,1)*qnorm(0.975)*sd_mean_sim
ci_sim
## [1] 3.493648 6.454829
Calculating the confidence interval using the actual population parameters
ci_theor <- theor_mean + c(-1,1)*qnorm(0.975)*theor_sd
ci_theor
## [1] 3.450512 6.549488
Confidence interval for the sampling distribution according to the simulated data with 95% confidence lies in (3.493648, 6.454829) whereas using the actual parameters it is (3.450512, 6.549488)