This project is to analyse the mean and variance of the exponential function using simulations. We simulate an exponential function with 40 observations for a thousand times and investigate their mean and variance
The result verifies the Central Limit Theorem. ie the mean and variance are distributed normally and are centered at theoretical mean and variance.
First set the parameters for the exponential function.
* Number of exponentials is 40
* Rate of exponential, lambda is 0.2
* Number of simulation is 1000
n <- 40
lambda <- 0.2
nosim <- 1000
The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda.
So let us calculate the theoretical mean and variance
mu <- 1/lambda
sigma <- 1/lambda
var <- sigma^2
The theoretical mean of this exponential function is 5
The theoretical variance of this exponential function is 25
Now let us run the simulation.
First set the seed so that this report can be regenerated
set.seed(1)
The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter.
Run the simulation 1000 times and store the mean of the 1000 simulations
mnsim <- replicate(nosim, mean(rexp(n, lambda)))
Now plot the mean of the simulated data. First plot the histogram of the sample means. Then overlay the normal distribution function with mean as mean of the sample means (blue curve)
Then also add the theorectical mean as a black line for comparison
library(ggplot2)
g <- ggplot(data.frame(mnsim),
aes(x=mnsim),
show_guide = FALSE) +
geom_histogram(binwidth=.3,
colour = "White",
fill = "red",
aes(y = ..density..),
show_guide = FALSE) +
## add the theoretical mean
geom_vline(xintercept = mu,
aes(fill="green"),
size = 2,
show_guide = FALSE) +
## add the normal distribution
stat_function(fun = dnorm,
color = "blue",
size = 2,
arg = list(mean=mean(mnsim), sd = sd(mnsim))) +
labs(title = "Sample Mean vs Theorectical mean \n
Means of exp function with 40 variables repeatead 1000 times") +
xlab("Sample mean") +
geom_text(aes(7,
0.45,
colour = "black",
fill = "black",
show_guide= FALSE,
label = "Black line is theorectical mean")) +
geom_text(aes(7,
0.4,
colour = "blue",
fill = "blue",
show_guide= FALSE,
label = "Blue curve is sample mean")) +
theme(legend.position = "none")
print(g)
As shown in the above plot, the normal distribution of sample means is centered at the theoretical mean.
We can quickly verify this by the actual values
* Theoretical mean = 5
* Mean of sample means = 4.99
Now let us do the same process for sample variance
First we simulate and then plot a histogram, theoretical variance, normal distribution
varsim <- replicate(nosim, var(rexp(n, lambda)))
g <- ggplot(data.frame(varsim),
aes(x=varsim),
show_guide = FALSE) +
## plot the histogram
geom_histogram(binwidth = 5,
colour = "White",
fill = "red",
aes(y = ..density..),
show_guide = FALSE) +
coord_cartesian(ylim = c(0, 0.05)) +
## add the theoretical variance
geom_vline(xintercept = var,
aes(fill="green"),
size = 2,
show_guide = FALSE) +
## add the normal distribution
stat_function(fun = dnorm,
color = "blue",
size = 2,
arg = list(mean=mean(varsim),
sd= sd(varsim))) +
labs(title = "Sample variance Vs Theoretical variance \n
Variance of exp function with 40 variables repeated 1000 times") +
xlab("Sample variance") +
geom_text(aes(60,
0.045,
colour = "black",
fill = "black",
show_guide= FALSE,
label = "Black line is theorectical variance")) +
geom_text(aes(60,
0.40,
colour = "blue",
fill = "blue",
show_guide= FALSE,
label = "Blue curve is sample variance")) +
theme(legend.position = "none")
print(g)
Once again, the normal distribution of sample variance is centered at the theoretical variance.
We can quickly verify this by the actual values
* Theoretical mean = 25
* Mean of sample means = 25.573
Finally, we can more formally verify that the sample mean and sample variance are normally distibuted by ploting the quantile-quantile plot of the theoretical quantiles in X axis and mean of the simlulated variables in Y axis
First we plot the mean qqplot
qqnorm(mnsim, main = "Mean Q-Q Plot",
xlab = "Theoretical Quantiles",
ylab = "Sample Quantiles",)
qqline(mnsim, col = "blue")
abline(h=mu, col = "red")
legend("topleft", lty = 1,
col = c("blue", "red"),
legend = c("qqline", "theoretical mean"))
And then the Variance QQ plot
qqnorm(varsim, main = "Variance Q-Q Plot",
xlab = "Theoretical Quantiles",
ylab = "Sample Quantiles",)
qqline(varsim, col = "blue")
abline(h=var, col = "red")
legend("topleft", lty = 1,
col = c("blue", "red"),
legend = c("qqline", "theoretical variance"))
As can be seen in both the plots, the values line on the QQline indicating that the distribution is approximately normal
The sample mean and sample variance are centered at theoretical mean and theoretical variance. Their distribution is approximatly normal. These observations are in line with the predictions of Central Limit Theorem