In this project you will investigate the exponential distribution in R and compare it with the Central Limit Theorem. The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. Set lambda = 0.2 for all of the simulations. You will investigate the distribution of averages of 40 exponentials. Note that you will need to do a thousand simulations.
n <- 40
lambda <- 0.2
simulation <- 1000
mean_t = 1/lambda
sd_t = ((1/lambda) * (1/sqrt(n)))
var_t = sd_t^2
data <- matrix(rexp(n*simulation, lambda), simulation)
row_means <- apply(data,1,mean)
mean_act <- mean(row_means)
sd_act <- sd(row_means)
var_act <- var(row_means)
The actual distribution is centered at 5.0450596 while the theoretical distribution is centered at 5
Variable | Theoretical Value | Actual Value |
---|---|---|
Mean | 5.0450596 | 5 |
Standard Deviation | 0.80881 | 0.7905694 |
Variance | 0.6541736 | 0.625 |
dfrow_means<-data.frame(row_means)
pp<-ggplot(dfrow_means,aes(x=dfrow_means))
pp<-pp+geom_histogram(binwidth = lambda,fill="pink",color="black",aes(y = ..density..))
pp<-pp + labs(title="Density of 40 Numbers from Exponential Distribution", x="Mean of 40 Selections", y="Density")
pp<-pp + geom_vline(xintercept=mean_act,size=1.0, color="black") # actual mean line
pp<-pp + stat_function(fun=dnorm,args=list(mean=mean_act, sd=sd_act),color = "blue", size = 1.0)
pp<-pp + geom_vline(xintercept=mean_t,size=1.0,color="yellow",linetype = "longdash")
pp<-pp + stat_function(fun=dnorm,args=list(mean=mean_t, sd=sd_t),color = "green", size = 1.0)
pp
## Don't know how to automatically pick scale for object of type data.frame. Defaulting to continuous.
The plot shows that Central Limit Theory works by trying to shape the actual data to follow the normal curve.