The project consists of two parts:
1. A simulation exercise.
2. Basic inferential data analysis. This document is focussing on the first question
In this project you will investigate the exponential distribution in R and compare it with the Central Limit Theorem. The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. Set lambda = 0.2 for all of the simulations. You will investigate the distribution of averages of 40 exponentials. Note that you will need to do a thousand simulations.
Illustrate via simulation and associated explanatory text the properties of the distribution of the mean of 40 exponentials. You should:
In point 3, focus on the difference between the distribution of a large collection of random exponentials and the distribution of a large collection of averages of 40 exponentials.
Based on a generated sample distribution of an exponential distribution the Central Limit Theory is investigated. The theoratical and the simulated characteristics of the distribution are compared and they are almost identical. Also a histogram of the means (rows of 40 values) is made. The histogram is normally distributed.
In the initialisation two things have to happen, initialisation of R and defining the variables from the assignment.
In this step the simulation set has to be generated. But before the seed has to be set so the simulation can be repeated.
set.seed(1)
simulation_set <- matrix(rexp(n*set,lambda), set)
The mean of the rows have to be calculated so the distribution can be verified according the CLT.
row_mean <- apply(simulation_set, 1, mean)
sim_mean <- mean(row_mean)
sim_sd <- sd(row_mean)
sim_var <- sim_sd^2
print_result <- matrix(c(theo_mean, theo_sd, theo_var, sim_mean, sim_sd, sim_var), nrow = 3, ncol = 2)
dimnames(print_result) = list(c("mean","standard deviation", "variance"),
c("theoratical", "simulation"))
print(print_result)
## theoratical simulation
## mean 5.0000000 4.9900252
## standard deviation 0.7905694 0.7859435
## variance 0.6250000 0.6177072
The differences between the theoratical characteristics of the distribution and the simulation seems to be negligible. If mean is rounded on three figures, it is only .01 off. The standard deviation and the variance are even less off (.004 and .007). This answers question 1 and 2.
In this step the average of the means of each row is plotted.
hist(row_mean, density=100, breaks=20, prob=TRUE, col = "blue",
xlab="average of 40", ylab = "density",
main="means of exponential distribution")
curve(dnorm(x, mean=theo_mean, sd=theo_sd),
col="black", lwd=2, add=TRUE, yaxt="n")
In the bars the mean is given. In black the normal function based on the theoratical characteristics (mean and standard deviation) is printed. The distribution based on the plot looks normally distributed and this means that the Central Limit Theory is proven.