Statistical Inference - Assignment

Craig Lewis

========================================================

Overview

In this assignment, e look at the exponential distribution using R and compare it ot the Central Limit Theorem (CLT). We are interested in showing the following: 1. Show the sample mean and compare it to the theoretical mean of the distribution. 2. Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution. 3. Show that the distribution is approximately normal by looking at the difference between the distribution of alarge collection of random exponentials and the distribution of a large collection of averages of 40 exponentials

Exponential Distribution

The formula for the exponential distribution is:

The theoretial mean is:

The variance is:

[N.B. We’ll call lambda “lambda” from here on out]

lambda<-0.2
nsim<-1000
n<-40

Simulations

For this assignment we start running an experiment by generating 40 exponential exponential variable with lambda equal to 0.2. We will then plot a histogram of the variables simulations and overlay the plot with the theoretical mean and the samle mean. For the next part of the assignment we will repeat this experiment 1000 times and save the mean and variances for each run. We will then plot on a histogram the means to show that this is an approximately normal distribution.

Sample mean vs Theoretical mean

First we’ll generate the 40 exponential variables using the value 0.2 for lambda. We will then calculate the theoretical mean.

samples<-rexp(n,lambda)
sample_mean<-mean(samples)
theoretical_mean<-1/lambda

The sample mean is 4.8891871 and the theoretical mean is 5 for lambda=0.2. For this simulation, we are pretty close and we can see this on the histogram below:

hist(samples,breaks=30,main="Exponential Distribution",xlab="Samples")
abline(v=theoretical_mean,col="blue",lwd=5)
abline(v=sample_mean,col="red",lwd=5)
legend("topright", c("Theoretical Mean", "Sample Mean"), col=c("blue", "red"), lwd=5)

Sample Variance vs Theoretical Variance

Now calculate the sample variance and compute the theoretical variance.

sample_var<-var(samples)
theoretical_var<-1/lambda^2

The sample variance is 14.9425916 and the theoretical mean is 25 for lambda=0.2.

Note: I don’t know how to graph this (nor am I sure that a graph is useful)

Normal Distribution of Means

We run an experiment where we generate 40 random exponentials with lambda=0.2 and calculate the mean. We repeat this experiment 1000 times and plot the results in a histogram:

mns=NULL
vrs=NULL
for (i in 1:1000) {
    sim<-rexp(n,lambda)
    mns=c(mns,mean(sim))
    vrs=c(vrs,var(sim))                
    }
hist(mns,xlab="Means of Runs",main="Distribution of Sample Means")

sample_mean<-mean(mns)
sample_var<-var(mns)

The sample mean (of means) is 4.9959623. The sample variance (of means) is 0.6574257. We know via the Central Limit Theorem that the distribution of averages for indepdendnt trials (which these are) becomes that of a standard normal as the sample size increases. If examine the graph above, we note that the distribution “appears” to be normal in that it is bell shaped and symmetric. Further analysis - via techniques not yet discussed in the course - could validate the assertion. So we’ll go with “it looks like one”.

For personal interest, we reran the experiment 100,000 times which results in another - more “normal” looking distribution.

mns=NULL
vrs=NULL
for (i in 1:100000) {
    sim<-rexp(n,lambda)
    mns=c(mns,mean(sim))
    }
hist(mns,xlab="Means of Runs",main="Distribution of Sample Means  (100,000 runs)",breaks=50)