Introduction

In this project I will investigate the exponential distribution in R and compare it with the Central Limit Theorem. I simulate the exponential distribution with rexp(n, lambda) where lambda is the rate parameter.
The theoritical mean and standard deviation for an exponential distibutin are equal that is 1/lambda.
The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda.
For this project I Set lambda = 0.2 for all of the simulations. I investigate the distribution of averages of 40 exponentials for a thousand simulations.
Here we aim to :

Variable Definations

Here we define variable that will remain constant for this project.

lambda <- 0.2

Theoritical_sd = Theoritical_mean <- 1 / lambda

sim_num <- 1000 # Number of simulations

sample_size <- 40 # Number of exponential in a sample(each simulation)

set.seed(10)

# Generate 40000 values with an exponential distribution
exp_dist <- rexp(sim_num * sample_size, lambda) 
  1. Show the sample mean and compare it to the theoretical mean of the distribution.
    To show this we will start by calculating means from the simulated data.
    At the end we should show simulated means concentrate around the theoritical mean which is 5
exp_distmeans <- apply(matrix(exp_dist, sim_num), 1, mean)

simulated_mean <- mean(exp_distmeans)

hist(exp_distmeans, main = "Distribution For Simulated means", xlab = "Sample Means", col = "pink")

abline(v = Theoritical_mean, col = "red")

abline(v = simulated_mean, col = "green")

From the plot the vertical red line represents the theoritical mean for an expontial distribution while green line represent the simulated mean.From this we can conculude that the simulated mean estimate the theoritical mean.

  1. Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution.
    Here we begin by calculating the theoritical mean with the formula s/sqrt(n) where s is the standard deviation and n is the sample size.
Theoritical_se <- Theoritical_sd / sqrt(40) # Standard Error 

Theoritical_var <- Theoritical_se^2 # The theoritical variance

simulated_sd <- sd(exp_distmeans) # sample distribution standard error

simulated_var <- simulated_sd^2

The sample standard deviation is 0.80881 while the theoritical sd is 0.7905694.
The sample variance is 0.6541736 while the theoritical variance is 0.625.From this we can conculude that the simulated sample variance estimate the theoritical variance.

  1. Show that the distribution is approximately normal.
    Here we concentrate on The central limit theorem (CLT) a statistical theory that states that given a sufficiently large sample size from a population with a finite level of variance, the mean of all samples from the same population will be approximately equal to the mean of the population. Furthermore, all of the samples will follow an approximate normal distribution pattern, with all variances being approximately equal to the variance of the population divided by each sample’s size. #### q-q plot
qqnorm(exp_distmeans)
qqline(exp_distmeans)

The qq-plot suggest normality in the distribution of the simulated sample means.

hist(exp_distmeans, main = "Distributions Normality", probability = TRUE,
     
     breaks = 20, xlab = "")

# Density for the sample means
lines(density(exp_distmeans), col = "red") 

# Theoretical density of the averages of samples
xfit <- seq(min(exp_distmeans), max(exp_distmeans), length = 200)

yfit <- dnorm(xfit, mean = Theoritical_mean, sd = Theoritical_sd/sqrt(sample_size))

lines(xfit, yfit, pch=22, col="green", lty=2)

From this the green line is theoritical normal distribution while the red line is the distribution from the simulated samples and seem to estimate the normal distribution.
This plots makes us conclude that the simulated distribution is approximately normal.