Statistical Inference: Assignment 1 Simulation

Spiro K

R Code used to generate answers for Questions 1 to 3 of the simulation exercise is outlined

# This code will generate the simulation and calculate summary stats such as mean, variance 
#and standard deviation for the sample/empircial observations and population

#set the seed
set.seed(1)

#simulation variables
nosim <- 1000
  lambda <- 0.2
     n <- 40
#generate the simulation data
    simdata <- matrix(rexp(nosim * n , rate=lambda), nosim)

# use apply to calculate the mean of the simulation
      simdata_m <- apply(simdata, 1, mean)

## empircial aka sample mean, variance and standard deviation is calculated
      emp_mean<-mean(simdata_m)
        emp_var<- var(simdata_m)
        emp_sd <- 1/lambda/sqrt(n)



## popluation mean, var and standard deviation is calculated

      pop_mean<-1/lambda #not as per assignment instructions
       pop_sd <- round((1/lambda/sqrt(n)),3)
      #pop_var <- (1/lambda)^2/n
       # emp_var<- var(simdata_m)

Question 1 Show the sample mean and compare it to the theoretical mean of the distribution?

 theoretical_mean <- 1/lambda
 emp <- round((mean(simdata_m)),3)
print (paste("Mean/center of the sample = ", 
                     emp))

## [1] "Mean/center of the sample =  4.99"

print (paste("Theoretical mean/center of the distribution = ", 
                     theoretical_mean))

## [1] "Theoretical mean/center of the distribution =  5"

So while the sample mean approximates the population mean and we would expect it get closer as we increase the number of simulations.

Question 2 Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution.

popvar <- (1/lambda)^2/n
        empvar<- round((var(simdata_m)),3)
print (paste("Theoretical variance of the distribution = ", 
                     popvar))

## [1] "Theoretical variance of the distribution =  0.625"

print (paste("Empirical variance of the distribution = ", 
                     empvar))

## [1] "Empirical variance of the distribution =  0.618"

So while the sample variance approximates the population variance we would expect it get closer as we increase the number of simulations.

Question 3 Show that the distribution is approximately normal?

hist(simdata_m,breaks=50,freq=FALSE, col='lightblue')
abline(v=1/lambda, col="red")
 curve(dnorm(x, mean=pop_mean, sd=pop_sd), 
          col="darkgreen", lwd=2, add=TRUE, yaxt="n")

In the above histogram we can see the shape of a normal distribution starting to form and it is approximating a normal distribution. The key to note is the distribution has a long right tail and is not symmetrical which is a property of a normal distribution.