In this project, we will explore the characteristics of the
exponential distribution using the R programming language and compare it
with the Central Limit Theorem. The exponential distribution can be
simulated in R using the function rexp(n, lambda), where
lambda represents the rate parameter. In this case, we will set lambda
to 0.2 for all simulations.
Our focus will be on examining the distribution of averages calculated from 40 exponential values. To obtain reliable results, we will perform a thousand simulations. By analyzing these averages, we can gain insights into how the Central Limit Theorem applies to the exponential distribution.
Import libraries.
library(ggplot2)
Set seed.
set.seed(1000)
Set the parameters.
lambda <- 0.2
n <- 40
sim <- 1000
Simulate 40 exponentials using the exponential distribution
rexp(n, lambda).
# Generate all exponential values at once
simExponentials <- matrix(rexp(n * sim, lambda), nrow = n)
# Reshape the matrix to have dimensions (n, sim)
dim(simExponentials) <- c(n, sim)
Calculate the mean.
simMean <- colMeans(simExponentials)
Calculate for the sample mean.
sampleMean <-mean(simMean)
sampleMean
## [1] 4.986963
Calculate for the theoretical mean.
theoreticalMean <- 1/lambda
theoreticalMean
## [1] 5
Plot the histogram of the simulated exponential sample means.
The histogram plot is displayed, showing the distribution of the simulated sample means. The mean value of the sample means, calculated as 4.9869, is marked by a violet vertical line. This value is found to be very close to the theoretical mean of 5, which represents the expected average. The histogram’s bars provide a visual representation of the frequency or count of different sample mean values.
# Plot the histogram of the simulated exponential sample means
hist(simMean, main = "Simulated Exponential Sample Means", col = "mistyrose", breaks = 100)
# Add vertical lines for the sample mean and theoretical mean
abline(v = sampleMean, col = "violet")
abline(v = theoreticalMean, col = "black")
Calculate the sample variance.
sampleSd <- sd(simMean)
sampleVar <- sampleSd^2
sampleVar
## [1] 0.654343
Calculate the theoretical variance.
theoreticalSd <- (1/lambda)/sqrt(n)
theoreticalVar <- theoreticalSd^2
theoreticalVar
## [1] 0.625
The bar plot showcases the comparison between the sample variance and the theoretical variance. The sample variance, with a value of 0.654343, is represented by a bar labeled “Sample Variance” in mistyrose. Simultaneously, the theoretical variance, which has a value of 0.625, is depicted by a separate bar labeled “Theoretical Variance.” These color-coded bars emphasize the variances being compared.
By comparing the two variances, it can be observed that the sample variance (0.654343) is slightly larger than the theoretical variance (0.625). This difference indicates that the variability in the simulated sample means, as represented by the sample variance, is slightly higher compared to what would be expected based on the theoretical variance.
# Create a data frame for plotting
variance_df <- data.frame(Type = c("Sample Variance", "Theoretical Variance"),
Variance = c(sampleVar, theoreticalVar))
# Plot the variability comparison
ggplot(variance_df, aes(x = Type, y = Variance, fill = Type)) +
geom_bar(stat = "identity", position = "dodge") +
labs(title = "Variability: Sample vs Theoretical Variance",
x = "Variance Type", y = "Variance") +
scale_fill_manual(values = c("mistyrose", "lavender")) +
theme_minimal()
The plot generated by the provided code suggests that the distribution of the simMean values can be considered approximately normal.
The histogram presents a visual representation of the frequency or count of the simMean values within different intervals. The histogram, labeled as “Normal Distribution” and colored in “mistyrose,” showcases a shape that closely resembles a symmetric bell curve. This bell-shaped appearance is a characteristic of a normal distribution.
Moreover, a normal distribution curve is overlaid on the histogram plot. The curve is generated based on the theoretical mean (theoreticalMean) and the theoretical standard deviation (theoreticalSd) of the distribution. The curve, with a dotted line style (lty = 5), aligns well with the histogram, further suggesting the resemblance to a normal distribution.
# Plot the histogram of the meanExp values
hist(simMean, main = "Normal Distribution", col = "mistyrose", breaks = 100)
# Generate x and y values for the normal distribution curve
xfit <- seq(min(simMean), max(simMean), length = 100)
yfit <- dnorm(xfit, mean = theoreticalMean, sd = theoreticalSd)
# Overlay the normal distribution curve on the histogram
lines(xfit, yfit*60, lty = 5)