In this project we will investigate the exponential distribution in R and compare it with the Central Limit Theorem. The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. Set lambda = 0.2 for all of the simulations. We will investigate the distribution of averages of 40 exponentials. Note that we are needed to do a thousand simulations.
Illustrate via simulation and associated explanatory text the properties of the distribution of the mean of 40 exponentials. We should
We will run a series of 1000 simulations to create a data set for comparison to theory. Each simulation will contain 40 observations and the expoential distribution function will be set to “rexp(40, 0.2)”.
We simulate 1000 samples for each size 40 with exponential distribution lambda=0.2 by using rexp(n, lambda). The mean of exponential distribution is 1/lambda. The standard deviation is also 1/lambda. We generate the samples and calculate the average of each sample.
library(ggplot2)
library(knitr)
no_simulation <- 1000 # number of simulations
lambda <- 0.2
n <- 40 # sample size
simulated_data <- matrix(rexp(n= no_simulation*n,rate=lambda), no_simulation, n)
sample_mean <- rowMeans(simulated_data)
The theoretical mean of the average of samples will be : 1/lambda .The following shows that the average from sample means and the theoretical mean are very close.
Sample Mean : The sample mean or empirical mean and the sample covariance are statistics computed from a collection (the sample) of data on one or more random variables.
actual_mean <- mean(sample_mean)
theoretical_mean <- 1/ lambda
result_1 <-data.frame("Mean"=c(actual_mean,theoretical_mean),
row.names = c("Mean from the samples ","Theoretical mean"))
result_1
The simulation mean of 4.983227 is close to the theoretical value of 5. Histogram plot of the exponential distribution n = 1000
sampleMean_data <- as.data.frame (sample_mean)
ggplot(sampleMean_data, aes(sample_mean)) +
geom_histogram(alpha=.5, position="identity", col="black", fill = "white") +
geom_vline(xintercept = theoretical_mean, colour="red",show.legend=TRUE) +
geom_vline(xintercept = actual_mean, colour="blue", show.legend=TRUE) +
ggtitle ("Histogram of the sample means ") +
xlab("Sample mean")+ylab("Density")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
The theoretical variance of the average of samples will be (1/lambda)^2/n. The following shows that the variance of sample means and the theoretical variance are very close in value.
actual_variance <- var(sample_mean)
theoretical_variance <- (1/ lambda)^2 /n
result_2 <-data.frame("Variance"=c(actual_variance, theoretical_variance),
row.names = c("Variance from the sample ","Theoretical variance"))
result_2
According to the central limit theorem (CLT), the averages of samples follow normal distribution.
This following plot shows that the distribution of the sample means almost matches the normal distribution. Also we create a Normal Probability Plot of Residuals below to confirm the fact that the distribution of sample means matches the theoretical normal distribution.
ggplot(sampleMean_data, aes(sample_mean)) +
geom_histogram(aes(y=..density..), alpha=.5, position="identity", fill="white", col="black") +
geom_density(colour="red", size=1) +
stat_function(fun = dnorm, colour = "blue", args = list(mean = theoretical_mean, sd = sqrt(theoretical_variance))) +
ggtitle ("Histogram of Sample Means with Fitting Normal Curve ") +
xlab("Sample mean") +
ylab("Density")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
qqnorm(sample_mean, main ="Normal Probability Plot")
qqline(sample_mean,col = "blue")
Both histogram and the normal probability plot show that distribution of averages is approximately normal.