Part 1: Simulation Exercise Instructions

In this project, we will explore the characteristics of the exponential distribution using the R programming language and compare it with the Central Limit Theorem. The exponential distribution can be simulated in R using the function rexp(n, lambda), where lambda represents the rate parameter. In this case, we will set lambda to 0.2 for all simulations.

Our focus will be on examining the distribution of averages calculated from 40 exponential values. To obtain reliable results, we will perform a thousand simulations. By analyzing these averages, we can gain insights into how the Central Limit Theorem applies to the exponential distribution.

Preprocessing

Import libraries.

library(ggplot2)

Set seed.

set.seed(1000)

Set the parameters.

lambda <- 0.2
n <- 40
sim <- 1000

Simulate 40 exponentials using the exponential distribution rexp(n, lambda).

# Generate all exponential values at once
simExponentials <- matrix(rexp(n * sim, lambda), nrow = n)  

# Reshape the matrix to have dimensions (n, sim)
dim(simExponentials) <- c(n, sim)

Calculate the mean.

simMean <- colMeans(simExponentials)

1. Show the sample mean and compare it to the theoretical mean of the distribution.

Calculate for the sample mean.

sampleMean <-mean(simMean)
sampleMean

## [1] 4.986963

Calculate for the theoretical mean.

theoreticalMean <- 1/lambda
theoreticalMean

## [1] 5

Plot the histogram of the simulated exponential sample means.

The histogram plot is displayed, showing the distribution of the simulated sample means. The mean value of the sample means, calculated as 4.9869, is marked by a violet vertical line. This value is found to be very close to the theoretical mean of 5, which represents the expected average. The histogram’s bars provide a visual representation of the frequency or count of different sample mean values.

# Plot the histogram of the simulated exponential sample means
hist(simMean, main = "Simulated Exponential Sample Means", col = "mistyrose", breaks = 100)

# Add vertical lines for the sample mean and theoretical mean
abline(v = sampleMean, col = "violet")
abline(v = theoreticalMean, col = "black")

2. Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution.

Calculate the sample variance.

sampleSd <- sd(simMean)
sampleVar <- sampleSd^2
sampleVar

## [1] 0.654343

Calculate the theoretical variance.

theoreticalSd <- (1/lambda)/sqrt(n)
theoreticalVar <- theoreticalSd^2
theoreticalVar

## [1] 0.625

The bar plot showcases the comparison between the sample variance and the theoretical variance. The sample variance, with a value of 0.654343, is represented by a bar labeled “Sample Variance” in mistyrose. Simultaneously, the theoretical variance, which has a value of 0.625, is depicted by a separate bar labeled “Theoretical Variance.” These color-coded bars emphasize the variances being compared.

By comparing the two variances, it can be observed that the sample variance (0.654343) is slightly larger than the theoretical variance (0.625). This difference indicates that the variability in the simulated sample means, as represented by the sample variance, is slightly higher compared to what would be expected based on the theoretical variance.

# Create a data frame for plotting
variance_df <- data.frame(Type = c("Sample Variance", "Theoretical Variance"),
                          Variance = c(sampleVar, theoreticalVar))

# Plot the variability comparison
ggplot(variance_df, aes(x = Type, y = Variance, fill = Type)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "Variability: Sample vs Theoretical Variance",
       x = "Variance Type", y = "Variance") +
  scale_fill_manual(values = c("mistyrose", "lavender")) +
  theme_minimal()

3. Show that the distribution is approximately normal.

The plot generated by the provided code suggests that the distribution of the simMean values can be considered approximately normal.

The histogram presents a visual representation of the frequency or count of the simMean values within different intervals. The histogram, labeled as “Normal Distribution” and colored in “mistyrose,” showcases a shape that closely resembles a symmetric bell curve. This bell-shaped appearance is a characteristic of a normal distribution.

Moreover, a normal distribution curve is overlaid on the histogram plot. The curve is generated based on the theoretical mean (theoreticalMean) and the theoretical standard deviation (theoreticalSd) of the distribution. The curve, with a dotted line style (lty = 5), aligns well with the histogram, further suggesting the resemblance to a normal distribution.

# Plot the histogram of the meanExp values
hist(simMean, main = "Normal Distribution", col = "mistyrose", breaks = 100)

# Generate x and y values for the normal distribution curve
xfit <- seq(min(simMean), max(simMean), length = 100)
yfit <- dnorm(xfit, mean = theoreticalMean, sd = theoreticalSd)

# Overlay the normal distribution curve on the histogram
lines(xfit, yfit*60, lty = 5)

Part 1: Simulation Exercise

Anjeanette Sy

2023-07-10

Part 1: Simulation Exercise Instructions

Preprocessing

1. Show the sample mean and compare it to the theoretical mean of the distribution.

2. Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution.

3. Show that the distribution is approximately normal.