Overview

This project is to analyse the mean and variance of the exponential function using simulations. We simulate an exponential function with 40 observations for a thousand times and investigate their mean and variance

The result verifies the Central Limit Theorem. ie the mean and variance are distributed normally and are centered at theoretical mean and variance.

Preprocessing

First set the parameters for the exponential function.
* Number of exponentials is 40
* Rate of exponential, lambda is 0.2
* Number of simulation is 1000

        n <- 40
        lambda <- 0.2
        nosim <- 1000

The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda.

So let us calculate the theoretical mean and variance

        mu <- 1/lambda
        sigma <- 1/lambda
        var <- sigma^2

The theoretical mean of this exponential function is 5
The theoretical variance of this exponential function is 25

Simulations

Now let us run the simulation.

First set the seed so that this report can be regenerated

        set.seed(1)

The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter.

Run the simulation 1000 times and store the mean of the 1000 simulations

        mnsim <- replicate(nosim, mean(rexp(n, lambda)))

Sample Mean vs Theoretical mean

Now plot the mean of the simulated data. First plot the histogram of the sample means. Then overlay the normal distribution function with mean as mean of the sample means (blue curve)

Then also add the theorectical mean as a black line for comparison

        library(ggplot2)
        
        
        g <- ggplot(data.frame(mnsim), 
                        aes(x=mnsim),
                        show_guide = FALSE) +
                geom_histogram(binwidth=.3,
                        colour = "White",
                        fill = "red",
                        aes(y = ..density..),
                        show_guide = FALSE) +                             
                ## add the theoretical mean
                geom_vline(xintercept = mu,
                        aes(fill="green"), 
                        size = 2, 
                        show_guide = FALSE) +
                ## add the normal distribution
                stat_function(fun = dnorm, 
                              color = "blue", 
                              size = 2,
                              arg = list(mean=mean(mnsim), sd = sd(mnsim))) +
                labs(title = "Sample Mean vs Theorectical mean \n
                     Means of exp function with 40 variables repeatead 1000 times") + 
                xlab("Sample mean") +
                geom_text(aes(7, 
                          0.45, 
                          colour = "black",
                          fill = "black",
                          show_guide= FALSE,
                        label = "Black line is theorectical mean")) +
                geom_text(aes(7, 
                          0.4, 
                          colour = "blue",
                          fill = "blue",
                          show_guide= FALSE,
                        label = "Blue curve is sample mean")) +
                theme(legend.position = "none")            
        
        print(g)

As shown in the above plot, the normal distribution of sample means is centered at the theoretical mean.

We can quickly verify this by the actual values
* Theoretical mean = 5
* Mean of sample means = 4.99

Sample Variance vs Theoretical Variance

Now let us do the same process for sample variance

First we simulate and then plot a histogram, theoretical variance, normal distribution

        varsim <- replicate(nosim, var(rexp(n, lambda)))

        g <- ggplot(data.frame(varsim), 
                        aes(x=varsim),
                        show_guide = FALSE) +
                ## plot the histogram
                geom_histogram(binwidth = 5,
                        colour = "White",
                        fill = "red",
                        aes(y = ..density..),
                        show_guide = FALSE) + 
                coord_cartesian(ylim = c(0, 0.05)) +
                ## add the theoretical variance
                geom_vline(xintercept = var,
                        aes(fill="green"), 
                        size = 2, 
                        show_guide = FALSE) +
                ## add the normal distribution
                stat_function(fun = dnorm, 
                              color = "blue", 
                              size = 2,
                              arg = list(mean=mean(varsim),
                                         sd= sd(varsim))) +
                labs(title = "Sample variance Vs Theoretical variance \n
                     Variance of exp function with 40 variables repeated 1000 times") + 
                xlab("Sample variance") +
                geom_text(aes(60, 
                          0.045, 
                          colour = "black",
                          fill = "black",
                          show_guide= FALSE,
                        label = "Black line is theorectical variance")) +
                geom_text(aes(60, 
                          0.40, 
                          colour = "blue",
                          fill = "blue",
                          show_guide= FALSE,
                        label = "Blue curve is sample variance")) +
                theme(legend.position = "none")            
        
        print(g)

Once again, the normal distribution of sample variance is centered at the theoretical variance.

We can quickly verify this by the actual values
* Theoretical mean = 25
* Mean of sample means = 25.573

Distribution

Finally, we can more formally verify that the sample mean and sample variance are normally distibuted by ploting the quantile-quantile plot of the theoretical quantiles in X axis and mean of the simlulated variables in Y axis

First we plot the mean qqplot

        qqnorm(mnsim,  main = "Mean Q-Q Plot",
                        xlab = "Theoretical Quantiles", 
                        ylab = "Sample Quantiles",)
        qqline(mnsim, col = "blue")
        abline(h=mu, col = "red")
        legend("topleft", lty = 1, 
               col = c("blue", "red"), 
               legend = c("qqline", "theoretical mean"))

And then the Variance QQ plot

        qqnorm(varsim,  main = "Variance Q-Q Plot",
                        xlab = "Theoretical Quantiles", 
                        ylab = "Sample Quantiles",)
        qqline(varsim, col = "blue")
        abline(h=var, col = "red")
        legend("topleft", lty = 1, 
               col = c("blue", "red"), 
               legend = c("qqline", "theoretical variance"))

As can be seen in both the plots, the values line on the QQline indicating that the distribution is approximately normal

Conclusion

The sample mean and sample variance are centered at theoretical mean and theoretical variance. Their distribution is approximatly normal. These observations are in line with the predictions of Central Limit Theorem