Course Project Part 1

3. Assessment

3.1 The Data

The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also also 1/lambda. Set lambda = 0.2 for all of the simulations. In this simulation, you will investigate the distribution of averages of 40 exponential(0.2)s. Note that you will need to do a thousand or so simulated averages of 40 exponentials.

set.seed(12345)
lambda <- 0.2
s_size <- 1000
n <- 40
simulated_sample <- replicate(s_size, rexp(n, lambda))
means_of_exponentials <- apply(simulated_sample, 2, mean)

3.2 Mean of Data

The theoretical and the sample mean of data is calculated.

s_mean <- mean(means_of_exponentials)
t_mean <- 1/lambda

s_mean

## [1] 4.971972

t_mean

## [1] 5

The sample mean is 4.971972 and the theoretical mean is 5, whcih are very close.

3.3 Varience of the Data

s_var <- var(means_of_exponentials)
t_var  <- (1 / lambda)^2 / (n) 
s_sd <- sd(means_of_exponentials)
t_sd  <- 1/(lambda * sqrt(n))

Now we check the individual variances and standard deviations of the sample and the theoretical data

s_var

## [1] 0.5954369

t_var

## [1] 0.625

s_sd

## [1] 0.7716456

t_sd

## [1] 0.7905694

Hence all the variances and standard deviantions has been displayed. Now its time for the plot.

3.4 The Plot

finaldata <- data.frame(means_of_exponentials)
library(ggplot2)
pl <- ggplot(finaldata, aes(x = means_of_exponentials))
pl <- pl + geom_histogram(aes(y = ..density..), fill = "grey66", color = "grey")
pl <- pl + labs(title = "Distribution of means of 40 Samples", x = "Mean of 40 Samples", y = "Density")
pl <- pl + geom_vline(aes(xintercept = s_mean, colour = "sample"))
pl <- pl + geom_vline(aes(xintercept = t_mean, colour = "theoretical"))
pl <- pl + stat_function(fun = dnorm, args = list(mean = s_mean, sd = s_sd), color = "gold1", size = 1.0)
pl <- pl + stat_function(fun = dnorm, args = list(mean = t_mean, sd = t_sd), colour = "red", size = 1.0)
pl

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

The density of the actual data is shown by the light blue bars. The theoretical mean and the sample mean are so close that they nearly overlap. The “red” line shows the normal curve formed by the the theoretical mean and standard deviation. The “gold” line shows the curve formed by the sample mean and standard deviation. As you can see from the graph, the distribution of means of 40 exponential distributions is close to the normal distribution with the expected theoretical values based on the given lambda.

3.5 Confidence Interval

s_confinterval <- round (mean(means_of_exponentials) + c(-1,1)*1.96*sd(means_of_exponentials)/sqrt(n),3)
t_confinterval <- t_mean + c(-1,1) * 1.96 * sqrt(t_var)/sqrt(n)

s_confinterval

## [1] 4.733 5.211

t_confinterval

## [1] 4.755 5.245

Hence the confidence intervals of the theoretical and the sample were found out to be very close.