Inferential statistics course project part 1

Overview

This document has been compiled for the purpose of completing the Inferential statistics course project (part 1) on coursera. It is majorly addressing the simulation of random exponentional values to prove the central limit theorem.

1. Simulation exercise

In this simulation exercise, I get 1000 random exponential numbers and use figures to illustrate their distribution ie mean and variance. I also compute the theoretical means and variance using the exponential distribution formulae and compare with the sample data. I then simulate 1000 exponetial numbers and take their averages 40 times to prove the central limit theory where the distribution of averages tends to a normal distribution.

library(knitr)
lambda <- 0.2
n <- 1000
b <- 40
mean0 <- 1/lambda
var0 <- ((1/lambda)^2)/b

set.seed(100)
randomExpvalues <- rexp(b,lambda)
mean1 <- round(mean(randomExpvalues),3)
var1 <- round((sd(randomExpvalues))^2/b,3)

Type <- c("Sample", "Theoretical")
Mean <- c(mean1, mean0)
Variance <- c(var1, var0)
Meancomparison <- data.frame(Type, Mean)
VarComparison <- data.frame(Type,Variance)

Sample mean vs theoretic mean

kable(Meancomparison, align = "l",caption = "Sample vs theoretical means")

Sample vs theoretical means
Type	Mean
Sample	4.137
Theoretical	5.000

hist(randomExpvalues, main =" Histogram of random exponential value", xlab = "random exponential values", col = "grey")
abline(v = mean0, col = "green")
abline(v = mean1, col = "red")

The sample data mean is the red line at 4.137 which is less than the theoretical mean of 5 at the green line.

Sample variance vs theoretical variance

kable(VarComparison, align = "l",caption = "Sample vs theoretical stats")

Sample vs theoretical stats
Type	Variance
Sample	0.306
Theoretical	0.625

hist(randomExpvalues, prob = TRUE, main = "Histogram of random exponential value", xlab = "random exponential values")
lines(density(randomExpvalues), col = "blue", lwd = 2)

The sample data is less variable ie 0.306 than is theoretically reported ie 0.625

Distribution approximation to normal

set.seed(100)
mtrx <- matrix(rexp(n*40,lambda),1000,40)
randomExpaverages <- apply(mtrx,1,mean)

par(mfrow = c(1,2))

hist(randomExpaverages, main = "Averages of Exp values", xlab = "Averages", col = "grey")
hist(randomExpvalues, main = "Random exponential values", xlab = "random exponential values", col = "grey")

When I compare the two graphs, I see that the distribution of averages of the random numbers is normally distributed and centred around the theoretical mean of 5

Inferential statistics course project part 1

Juliet Nantege

6/14/2017

Overview

1. Simulation exercise