Setting up the data and all variables needed.

set.seed(2000) # for reproducability
library(ggplot2) # for use later-on
nosim <- 1000 # number of simulations
n <- 40 # number of exponentials
ld <- 0.2 # lambda parameter
df <- matrix(rexp(n * nosim, ld), nosim) # simulation of random variables
sampleMean <- rowMeans(df) # sample of each row

1. Show the sample mean and compare it to the theoretical mean of the distribution.

Here the graphs are computed and shared which demonstrate the difference between the sample mean and the theoretical mean.

## Sample mean, standard deviation and variance is computed
meanSampleMean <- mean(sampleMean)
sampleSd <- sd(sampleMean)
sampleVar <- var(sampleMean)

## Theoretical mean, standard deviation and variance is computed.
theoryMean <- 1/ld
theorySd <- (1/ld) * (1/sqrt(n))
theoryVar <- theorySd ^ 2

## Histograms are plotted. Firstly for the random exponentials
hist(df, 
     main="Sim-Exp distribution",
     xlab = "40 random exponentials")

This is the result of non-averaged exponentials, they show no unifrom normal distribution

## Second for the average of random expontentials
hist(sampleMean,
     col="red3",
     main="Avg-Sim Expontentials",
     xlab = "Average of 40 exponentials")
abline(v = theoryMean, col="blue", lwd=2)

Once averaged, the distribution begins to take a typical normal distribution with the the highest point (where we would expect to see the mean) around the same point as the theorised mean, as highlighted by the blue line.

2. Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution.

Then the differences between the figures that were theorised and the results from the sample simulation are calculated

# Differences are calculated 
meanDifference <- (theoryMean - meanSampleMean)
sdDifference <- (theorySd - sampleSd)
varDifference <- (theoryVar - sampleVar)

differenceData <- data.frame(
  SampleData = c(meanSampleMean, sampleSd, sampleVar),
  TheoryData = c(theoryMean, theorySd, theoryVar),
  Difference = c(meanDifference, sdDifference, varDifference)
) # Dataframe is created

differenceData

##   SampleData TheoryData  Difference
## 1  5.0294103  5.0000000 -0.02941033
## 2  0.8312410  0.7905694 -0.04067158
## 3  0.6909616  0.6250000 -0.06596159

The above data-frame highlights the numerical differences between the theorised and sample values, demonstrating the effect of a large sample size.

3. Show that the distribution is approximately normal.

Then the distribution of the data being approximately normal is shown through a bell curved line that follows the distribution of data points for the average of random expontentials.

simDataMean <- data.frame(sampleMean)
ggplot(simDataMean, aes(sampleMean)) +
    geom_histogram(
        binwidth = .3,
        fill = "steelblue",
        color = "black",
        aes(y = after_stat(density))
    ) +
    geom_density(color = "blue", lwd = 1) +
    labs(title = "Distribution of Random Exponential Values with 1000 simulations",
         x = "Average of 40 Exponentials", y = "Density") +
    stat_function(
        fun = dnorm,
        args = list(mean = theoryMean, sd = theorySd),
        color = "red",
        lwd = 1
    ) +
    theme_bw() # the histogram is created and computed

Here the blue line represents the sample distribution while the red line represents the theorized distribution. As we can infer, both lines are approximately distributed to eachother.

qqnorm(sampleMean) 
qqline(sampleMean, col = "red")

Above we can see the theoretical quantiles compared to their samples. The red line and the black dots hug eachother around the lower quantiles, while being more varied on the extreme/outer quantiles, highlighting its normal distribution.

Conclusion

As can be seen from the combination of both graphical demonstrations as well as numerical similarities, it can be concluded that with a large enough simulation, the sample will approximate the theorized mean.

Statistical Inference Course Project

spemurphy

2024-10-15

1. Show the sample mean and compare it to the theoretical mean of the distribution.

2. Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution.

3. Show that the distribution is approximately normal.

Conclusion