Statistical Inference

Statistical Inference: Peer Assessment 1

Jenina Halitsky

September 20, 2014

=========================================================================================================================

Synopsis

The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also also 1/lambda. Set lambda = 0.2 for all of the simulations. In this simulation, you will investigate the distribution of averages of 40 exponential(0.2)s. Note that you will need to do a thousand or so simulated averages of 40 exponentials.

Illustrate via simulation and associated explanatory text the properties of the distribution of the mean of 40 exponential(0.2)s. You should 1. Show where the distribution is centered at and compare it to the theoretical center of the distribution. 2. Show how variable it is and compare it to the theoretical variance of the distribution. 3. Show that the distribution is approximately normal. 4. Evaluate the coverage of the confidence interval for 1/lambda: X¯±1.96Sn√.

Setup Configurations & Libraries

        library(knitr)
        opts_knit$set(progress=FALSE, verbose = TRUE)
        opts_chunk$set(echo=TRUE, message=FALSE, tidy=TRUE, comment=NA,
                       fig.path="figure/", fig.keep="high", fig.width=10, fig.height=6,
                       fig.align="center")

Load needed libraries.

require(plyr)
require(ggplot2)

=============================================================================================================================

Question 1: Show where the distribution is centered at and compare it to the theoretical center of the distribution.

lambda = 0.2  # Set lambda = 0.2 for all of the simulations
n = 40  # Size of the sample = 40
nosim = 1000  # Number of Simulations = 1000
mean = 1/lambda  # The mean of exponential distribution is 1/lambda
sd = 1/lambda  # The standard deviation is also also 1/lambda
head(mean)  # Show actual value of Mean  [5]

[1] 5

head(sd)  # Show actual value of Standard Deviation [5]

[1] 5

expsd = sd/sqrt(n)  # Expected standard deviation
head(expsd)  # Show actual value of the Expected Standard Deviation [0.7905694]

[1] 0.7906

rep = replicate(nosim, mean(rexp(n, lambda)))  # 1000 or so simulated averages of 40 exponentials
repMean = mean(rep)  # mean of the replicated data
repMed = median(rep)  # median of the replicated data
repSD = sd(rep)  # standard deviation of the replicated data
head(rep)  # Show actual values of the replicated data [4.365795 4.153623 4.826297 5.423481 4.960753 5.586594]

[1] 4.873 4.777 4.541 5.869 4.623 5.598

head(repMean)  # Show actual value of the replicated Mean [5.010652]

[1] 5

head(repMed)  # Show actual value of the replicated Median [4.982072]

[1] 4.946

head(repSD)  # Show actual value of the replicated Standard Deviation [0.7847414]

[1] 0.7901

Question 1 Solution

** We are given that the Lambda is 0.2 the population mean and SD would both be 1/labmda which equals 5. The sample data would look like for the sample size of 40, the expected standard deviation formula of (standard deviation) divided by the square root of the sample size would be 5/sqrt(40) which is 0.790569 whereas the replicated standard deviation is 0.7847414. The replicated data sample has a mean of 5.010652 and a standard deviation of 0.7847414 which is very close to what was expected of the sample data. Taking it a step further, the median for the replicated data sample is 4.982072 which helps to support that this is non-skewed data. **

=============================================================================================================================

Question 2: Show how variable it is and compare it to the theoretical variance of the distribution.

var(rep)  # Variance of replicated data [0.6158191]

[1] 0.6242

varexp = expsd^2  # Variance expected 
head(varexp)  # Show actual value of the variance expected [0.625]

[1] 0.625

Question 2 Solution

To give an idea of what the variance would be we take the variance of the replicated data which is 0.6158191 and compare it to the variance of the expected data which is 0.625. Again, helping to support that this is non-skewed data.

=============================================================================================================================

Question 3: Show that the distribution is approximately normal.

x <- seq(2, 8, length = 100)
y <- dnorm(x, mean = mean, sd = expsd)
y = nosim * y * 0.2  # Setting up the data
xy = data.frame(cbind(x, y))
dat2 = data.frame(rep)

library(ggplot2)
g <- ggplot(dat2, aes(x = rep)) + geom_histogram(alpha = 0.2, binwidth = 0.2, 
    colour = "blue")
g <- g + xlab("Mean of Sample") + ylab("Frequency") + ggtitle("Distribution of the Sample Mean \n with Population Mean and Normal Curve")
g <- g + geom_vline(xintercept = mean, size = 1)
g <- g + geom_point(data = xy, aes(x = x, y = y, colour = "Normal"))
g

plot of chunk Graph Data

Question 3 Solution

The histogram plot shows that the distribution is approximately normal with a mean of 5 and a standard deviation of 0.7847414. This helps to assess how well the dataset matches a 95% confidence level to the population.

=============================================================================================================================

Question 4: Evaluate the coverage of the confidence interval for 1/lambda: X¯±1.96Sn√.

absmean = abs(rep - mean) - 1.96 * expsd  # Absolute difference of the mean and subtract the Zvalue times the expected standard deviation
head(absmean)  # Show the actual value of absmean [-0.9153106 -0.7031387 -1.3758126 -1.1260352 -1.5102693 -0.9629220]

[1] -1.4228 -1.3268 -1.0907 -0.6810 -1.1722 -0.9515

prop = 100 * length(absmean[absmean <= 0])/nosim  # Convert the range to a percentage
head(prop)  # Show percentage  [95.2]

[1] 95.6

Question 4 Solution

** For the sample data, the proportion of the sample mean are within a 95% interval of the population mean of 95.2%**