Overview

In this report, we are exploring the distribution of mean of \(40\) random variables with exponential distribution. We take a sample of 1000 observations and compare the mean, variance and sd of this sample by their correspond theoretical values. At the end, we show that the distribution of mean of these 40 exponential random variables is approximately normal.

Simulation

We want to investigate the distribution of averages of 40 exponential randome variables. To do that first we notice the fact that to produce and exponential random variable we need to use rexp(n,rate) in R.
We want to repeat our observations \(1000\) times:

set.seed(1)
sdata=rexp(40000,rate = .2)
mdata=matrix(sdata,nrow = 1000,ncol = 40)

As you can see in the above code, we first generate \(40000\) exponential random values with \(\lambda=0.2\) and then sort them in a matrix form with \(40\) columns and \(1000\) rows. We can consider each column as an exponential variable and each row as an observation.
Now we want to calculate the mean of these \(40\) variables in each observation:

mean.data=apply(mdata,1,mean)
length(mean.data)
## [1] 1000

By the definition of exponential random variable the mean of each variable is:

Mean.Each.Variable=1/0.2
Mean.Each.Variable
## [1] 5

Hence by theorem 1 in appendix one, the mean of \(\bar{X}\), average of \(40\) variables, is \(5\). Now it is time to take a look at our observasions. According to our simulation, the mean of \(\bar{X}\) after 1000 observations is:

Mean.Simulation=mean(mean.data)
Mean.Simulation
## [1] 4.990025

As you can see it is so close to the theoretical mean \(5\). As you can see in figure 1 in appendix two,the theoretical mean is shown by a red line and the observed mean is shown by a blue line. We took the observed mean to be \(Mean.Simulation-.02\) since we wanted the difference between theoretical and observed mean to be observable on the diagram.
By theorem 1, the theoretical variance of \(\bar{X}\) is:

Theoretical.Variance=1/(((.2)**2)*1000)
Theoretical.Variance
## [1] 0.025

On the other hand, the observed variance of \(\bar{X}\) is:

Observed.Variance=var(mean.data)
Observed.Variance
## [1] 0.6177072

Now let’s consider the variance of 40 exponential random variables as a random variable. We show that by \(S^ 2\). We can take the variance of each observation:

Variance.Simulation=apply(mdata,1,var)
length(Variance.Simulation)
## [1] 1000

And thus the mean of Observed variances is:

mean(Variance.Simulation)
## [1] 25.05783

which is close to the theoretical variance \(25\). We can show the observerd variance of 40 variables on a diagram, figure 2 in appendix two. As you can see on this histogram, the mean of observed variance of 40 variables is so close to the theoretical variance. We have to put theoretical variance equal to \(25-.2\) so that we can see the difference between theoretical and observed variances.
According to CLT theorem, if the number of random variables is large enough, the distribution of \(\bar{X}\) is normal with \(\mu_{\bar{X}}=\mu\) and \(\sigma_{\bar{X}}^ 2=\sigma ^2\) which are \(5\) and \(25\) in our case. We can see both theoretical and observed distribution of \(\bar{X}\) in figure 3, appendix two.

The blue curve represent the normal distribution with \(\mu=5\) and \(\sigma=5\). The red curve represent the observed distribution. As you can see, although the observed distribution is approximately normal, the estimated mean and sd are not accurate and that’s why these two curves are not close to each other.
Now we can compare the observed distribution of average of 40 variables with the distribution of exponential random variables.

Exp.Var=rexp(1000,rate = .2)
plot(density(mean.data,from = 0,to=8),col="red",main = "Density")
lines(density(Exp.Var,from = 0,to=8),col="blue")
legend("topright", pch = "-", col = c("blue", "red"), legend = c("Exponential distribution", "Mean distribution"))

As you can see the distribution of mean of 40 exponential variables is approximately normal while the distribution of an exponential random variable is far away from being normal.

Appendix one

First we need to have the definition of an exponential distribution:

Definition: \(X\) is said to have an exponential distribution with parameter \(\lambda\) if the pdf of \(X\) is \[f(x,\lambda)=\lambda e^{-\lambda x}\] when \(x\geq 0\) and zero otherwise. Moreover, both the mean and standard deviation of the exponential distribution equal \(1/\lambda\).

The followinf theorem is main theorem that we want to use in this report:

Theorem 1: Let \(X_{1},X_{2},...,X_{n}\) be a random sample from a distribution with mean value \(\mu\) and standard deviation \(\sigma\). Then
1.\(\mu_{\bar{X}}=\mu.\)
2.\(\sigma_{\bar{X}}=\sigma/\sqrt(n).\)
3.\(V(\bar{X})=\sigma^ 2/n\)
in which \(\mu_{\bar{X}}\),\(\sigma_{\bar{X}}\) and \(V(\bar{X})\) are mean,sd and the variance of \(\bar{X}\)( the mean of \(X_{1},X_{2},...,X_{n}\)).

And finally we have the Central Limit Theorem:

Theorem( CLT): Let \(X_{1},X_{2},...,X_{n}\) be a random sample from a distribution with mean \(\mu\) and variance \(\sigma^ 2\). Then if \(n\) is sufficiently large,\(\bar{X}\) has approximately a normal distribution with \(\mu_{\bar{X}}=\mu\) and \(\sigma_{\bar{X}}^ 2=\sigma ^2\).

Appendix Two

hist(mean.data,main = "Mean of Simulation data( Figure 1)", xlab = "Mean")
legend("topright", pch = "-", col = c("blue", "red"), legend = c("observed mean", "theorerical mean"))
abline(v=Mean.Each.Variable,col="red")
abline(v=Mean.Simulation-.02,col="blue")

hist(Variance.Simulation,main="Observed Variance( Figure 2)",xlab = "Variance")
legend("topright", pch = "-", col = c("blue", "red"), legend = c("Theoretical variance", "Mean of Observed variance"))
abline(v=mean(Variance.Simulation),col="red")
abline(v=25-.2,col="blue")

hist(mean.data,main = "Mean of Simulation data( Figure 3)", xlab = "Mean",probability = T)
legend("topright", pch = "-", col = c("blue", "red"), legend = c("Theoretical", "Observed"))
lines(density(mean.data,from = 0,to=8),col="red")
x=seq(0,10,.01)
curve(dnorm(x, mean = 5, sd = 5),col="blue",add=T)