In this project you will investigate the exponential distribution in R and compare it with the Central Limit Theorem. The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. Set lambda = 0.2 for all of the simulations. You will investigate the distribution of averages of 40 exponentials. Note that you will need to do a thousand simulations.
Illustrate via simulation and associated explanatory text the properties of the distribution of the mean of 40 exponentials. Exponential distribution: It describes times between events happening at constant rate ?? with expected value 1/??
Central Limit Theoram: In probability theory, the central limit theorem (CLT) establishes that, in some situations, when independent random variables are added, their properly normalized sum tends toward a normal distribution (informally a “bell curve”) even if the original variables themselves are not normally distributed.
#Theoretical mean
t_mean <- 1/0.2
#Sample mean
mns_mean <- mean(mns)
#Difference between sample mean and Theoretical mean
mean_diff <- data.frame(Mean.Title = c("Sample mean","Theoreticl mean"),
Mean.values = c(mns_mean,t_mean))
mean_diff
## Mean.Title Mean.values
## 1 Sample mean 4.970042
## 2 Theoreticl mean 5.000000
require(ggplot2)
## Loading required package: ggplot2
p <- ggplot(data = NULL, aes(x=mns)) +
geom_histogram(aes(y = ..density..), color="black", fill='NA', binwidth=.25)+
geom_density(color='blue',lwd=1)+
geom_vline(data = mean_diff,aes(xintercept = Mean.values,
linetype=Mean.Title, colour=Mean.Title)) +
stat_function(fun = dnorm,args = list(mean = t_mean, sd = .5), color = "red") +
labs(title="Sample Mean distribution", x="Sample Means")+
scale_x_continuous( breaks=1:10)
print(p)
The above graph shows that the distribution of 1000 means simulated from an exponential distribution(each one with 40 observations) clearly shows that the distribution of the means(marked in yellow) is normality distributed (red line).Hence it proves the Central Limit Theoram . Given a larger dataset we would be able to produce much better approximation of normal distribution.
sv_mns_mean <- var(mns)
sv_t_mean <- 1/(lambda^2*n)
var_diff <- data.frame(Mean.Title = c("Sample variance","Theoretical variance")
,Mean.values = c(sv_mns_mean,sv_t_mean))
var_diff
## Mean.Title Mean.values
## 1 Sample variance 0.589984
## 2 Theoretical variance 0.625000
qqnorm(mns)
qqline(mns)