Statistical Inference Course Project

INTRODUCTION

In this project I investigate the exponential distribution in R and compare it with the Central Limit Theorem.

n <-  40
lambda <- 0.2
simul <- 1000
set.seed(1)

simul_data <- matrix(rexp(n * simul, rate = lambda), simul)

simul_means <- (apply(simul_data, 1, mean)) #rowMeans(simul_data)

hist(simul_means, breaks = 40, col = "yellow", 
     main = "Distribution of 1000 Simulated Means", 
     xlab = "Mean of 40 Samples")

Mean of Means

simul_mean <- mean(simul_means)
simul_mean

## [1] 4.990025

Theoretical Mean

theor_mean <- 1 / lambda # Theoretical Mean
theor_mean

## [1] 5

Student’s t Test

Performing a t test of the theoretical mean highlights a p-value of 0.6882554 therefore we fail to reject the null hypothesis (p>0.5).

t.test(simul_means, mu=theor_mean)$p.val

## [1] 0.6882554

A check of the simulated mean of means highlights a p-value of 1 which again shows we fail to reject the null hypothesis. This p-value also represents that the mean in this case is the true mean of the data (p=1).

t.test(simul_means, mu=simul_mean)$p.val

## [1] 1

Variance Comparisons

simul_var <- var(simul_means)
simul_var

## [1] 0.6177072

theor_var  <- (1 / lambda)^2 / n # Theoretical Variance
theor_var

## [1] 0.625

The simulated mean variance and thoretical mean variancec are similar.

Calculate the Stanadard Deviations, these will be used for comparing the distrubutions in the next section.

simul_SD <- sd(simul_means)
simul_SD

## [1] 0.7859435

theor_SD <- 1/(lambda * sqrt(n))
theor_SD

## [1] 0.7905694

Compare Distributions

require(ggplot2)

## Loading required package: ggplot2

df <- data.frame(simul_means)
ggplot(df, aes(x = simul_means)) +
geom_histogram(aes(y=..density..), colour="black",
                        fill = "yellow", bins = 40) +
geom_vline(aes(xintercept = simul_mean)) +
geom_vline(aes(xintercept = theor_mean)) +
stat_function(fun = dnorm, args = list(mean = simul_mean, sd = simul_SD), color = "blue", size = 0.5) +
stat_function(fun = dnorm, args = list(mean = theor_mean, sd = theor_SD), colour = "red", size = 0.5) + 
labs(title = "Distribution of Means of 1000 Simulations of 40 Samples", 
     x = "Mean of 40 Samples", 
     y = "Density") +
theme_bw()

Next the sampled distribution is compared to the normal distribution ‘dnorm’ using both the simulated and theoretical parameters to define the normal distribution. In this case the simulated means demonstrates a relatively normal distribution.

Statistical Inference Course Project - Part 1

Nick

13 June 2019