Instructions

The project consists of two parts:

1. A simulation exercise.
2. Basic inferential data analysis.

I will create a report to answer the questions. Given the nature of the series, I will use knitr to create the reports and convert to a pdf.

Each pdf report should be no more than 3 pages with 3 pages of supporting appendix material if needed (code, figures, etc).

Part 1

In this project I will investigate the exponential distribution in R and compare it with the Central Limit Theorem. The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. Set lambda = 0.2 for all of the simulations. I will investigate the distribution of averages of 40 exponentials. Note that I will need to do a thousand simulations.

I will illustrate via simulation and associated explanatory text the properties of the distribution of the mean of 40 exponentials and answer 3 questions:

1. Show the sample mean and compare it to the theoretical mean of the distribution.

2. Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution.

3. Show that the distribution is approximately normal.

In point 3, I will focus on the difference between the distribution of a large collection of random exponentials and the distribution of a large collection of averages of 40 exponentials.

Simulations

A good practice is to set a seed in R random number generator before simulating random observations so that experiment can be reproduced by peers.

set.seed(4567)

Processing the experiment

# setting up parameters from the instructions
lambda <- 0.2
n <- 40
sim <- 1000

# simulating the experiment
sim_ex <- replicate(sim, rexp(n, lambda))

# calculating the simulated exponentials 
mean_exp <- apply(sim_ex, 2, mean)

Results

Question 1: sample mean vs. theoretical mean

The mean of the exponential distributions is equal to 1/lambda. So, if lambda is equal to 0.2, it is expected a theoretical mean near to 5.

samp_mean <- mean(mean_exp)
samp_mean
## [1] 5.02484
theo_mean <- 1/lambda
theo_mean
## [1] 5

The histogram shows simulated exponential means. The blue vertical line indicates the sample mean and the red vertical line indicates the theoretical mean.

Figure 1

hist(mean_exp, main= "Simulated Exponential Sample Means", col="azure", breaks = 100, xlab="Experiment Means", ylab="Frequency")
abline(v=samp_mean, col="blue4")
abline(v=theo_mean, col="darkred")

The sample mean is equal to 5.02484 and the theoretical mean is 5.

Question 2: sample variance vs. theoretical variance

The standard deviation of the exponential distribution is equal to (1/lambda)/sqrt(n). Square the standard deviation to calculate the variance. So, we will compare the sample variance to the theoretical variance

samp_sd <- sd(mean_exp)
samp_var <- samp_sd^2
samp_var
## [1] 0.6373425
theo_sd <- (1/lambda)/sqrt(n)
theo_var <- theo_sd^2
theo_var
## [1] 0.625

The sample variance is equal to 0.6373425 which is near to the theoretical variance equal to 0.625

Question 3: Distribution

We will analyse if the random exponential distribution is approximately normal. Due to the Central Limit Theorem, the means of the experiment is expected as a normal distribution.

Figure 2.

hist(mean_exp, main = "Normal Distribution Shape", col ="azure", breaks = 100, xlab="Experiment Means", ylab="Frequency")
x_axis <- seq(min(mean_exp), max(mean_exp), length=100)
y_axis <- dnorm(x_axis, mean=1/lambda, sd=(1/lambda)/sqrt(n))
lines(x_axis, y_axis*60, lty=5, col="magenta")

The distribution above appears following a normal distribution, due the Central Limit Theorem. The shape is closely to a normal distribution as the number of samples increases.